Learning an animatable detailed 3D face model from in-the-wild images

  title={Learning an animatable detailed 3D face model from in-the-wild images},
  author={Yao Feng and Haiwen Feng and Michael J. Black and Timo Bolkart},
  journal={ACM Transactions on Graphics (TOG)},
  pages={1 - 13}
While current monocular 3D face reconstruction methods can recover fine geometric details, they suffer several limitations. Some methods produce faces that cannot be realistically animated because they do not model how wrinkles vary with expression. Other methods are trained on high-quality face scans and do not generalize well to in-the-wild images. We present the first approach that regresses 3D face shape and animatable details that are specific to an individual but change with expression… 
Neural Emotion Director: Speech-preserving semantic control of facial expressions in "in-the-wild" videos
This method is the first to be capable of controlling the actor’s facial expressions by even using as a sole input the semantic labels of the manipulated emotions, while at the same time preserving the speech-related lip movements.
Realistic One-shot Mesh-based Head Avatars
This work presents a system for realistic one-shot mesh-based human head avatars creation, ROME, which estimates a person-specific head mesh and the associated neural texture, which encodes both local photometric and geometric details.
Novel View Synthesis for High-fidelity Headshot Scenes
This method learns a Generative Adversarial Network to mix a NeRF-synthesized image and a 3DMM-rendered image and produces a photorealistic scene with a face preserving the skin details.
3D GAN Inversion for Controllable Portrait Image Animation
This work proposes a supervision strategy to flexibly manipulate expressions with 3D morphable models, and shows that the proposed method also supports editing appearance attributes, such as age or hairstyle, by interpolating within the latent space of the GAN.
Facial Geometric Detail Recovery via Implicit Representation
This work presents a robust texture-guided geometric detail recovery approach using only a single in-the-wild facial image and registers the implicit shape details to a 3D Morphable Model template, which can be used in traditional modeling and rendering pipelines.
S2F2: Self-Supervised High Fidelity Face Reconstruction from Monocular Image
This work achieves, for the first time, high fidelity face reconstruction using self-supervised learning only, and allows it to solve the challenging problem of decoupling face reflectance from geometry using a single image, at high computational speed.
Video-driven Neural Physically-based Facial Asset for Production
LONGWEN ZHANG∗, ShanghaiTech University, China and Deemos Technology, China CHUXIAO ZENG∗, ShanghaiTech University, China and Deemos Technology, China QIXUAN ZHANG∗, ShanghaiTech University, China
Finding Directions in GAN's Latent Space for Neural Face Reenactment
The qualitative and quantitative results show that this approach often produces reenacted faces of significantly higher quality than those produced by state-of-theart methods for the standard benchmarks of VoxCeleb1 & 2.
Learning-by-Novel-View-Synthesis for Full-Face Appearance-based 3D Gaze Estimation
This work ex-amines a novel approach for synthesizing gaze estimation training data based on monocular 3D face reconstruction and proposes a mask-guided gaze estimation model and data augmentation strategies to further improve the estimation accuracy by taking advantage of synthetic training data.
Generating Diverse 3D Reconstructions from a Single Occluded Face Image
Quantitative and qualitative comparisons of 3D reconstruction on occluded faces show that Diverse3DFace can estimate 3D shapes that are consistent with the visible regions in the target image while exhibiting high, yet realistic, levels of diversity in the occluding regions.


Annotated Facial Landmarks in the Wild: A large-scale, real-world database for facial landmark localization
AFLW provides a large-scale collection of images gathered from Flickr, exhibiting a large variety in face appearance as well as general imaging and environmental conditions, and is well suited to train and test algorithms for multi-view face detection, facial landmark localization and face pose estimation.
Towards Fast, Accurate and Stable 3D Dense Face Alignment
A novel regression framework which makes a balance among speed, accuracy and stability, and a meta-joint optimization strategy to dynamically regress a small set of 3DMM parameters, which greatly enhances speed and accuracy simultaneously.
FaceScape: A Large-Scale High Quality 3D Face Dataset and Detailed Riggable 3D Face Prediction
  • Haotian Yang, Hao Zhu, Xun Cao
  • Computer Science
    2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
  • 2020
A novel algorithm is proposed that is able to predict elaborate riggable 3D face models from a single image input and learns the expression-specific dynamic details using a deep neural network.
Cross-Modal Deep Face Normals With Deactivable Skip Connections
This work proposes a method that can leverage all available image and normal data, whether paired or not, thanks to a novel cross-modal learning architecture that allows learning of a rich latent space that can accurately capture the normal information.
Extreme 3D Face Reconstruction: Seeing Through Occlusions
A layered approach which decouples estimation of a global shape from its mid-level details (e.g., wrinkles) and then separately layer this foundation with details represented by a bump map, motivated by the concept of bump mapping is proposed.
Joint 3D Face Reconstruction and Dense Alignment with Position Map Regression Network
A straightforward method that simultaneously reconstructs the 3D facial structure and provides dense alignment and surpasses other state-of-the-art methods on both reconstruction and alignment tasks by a large margin.
Learning to Regress 3D Face Shape and Expression From an Image Without 3D Supervision
To train a network without any 2D-to-3D supervision, RingNet is presented, which learns to compute 3D face shape from a single image and achieves invariance to expression by representing the face using the FLAME model.
Learning a model of facial shape and expression from 4D scans
Faces Learned with an Articulated Model and Expressions is low-dimensional but more expressive than the FaceWarehouse model and the Basel Face Model and is compared to these models by fitting them to static 3D scans and 4D sequences using the same optimization method.
Accurate 3D Face Reconstruction With Weakly-Supervised Learning: From Single Image to Image Set
A novel deep 3D face reconstruction approach that leverages a robust, hybrid loss function for weakly-supervised learning which takes into account both low-level and perception-level information for supervision, and performs multi-image face reconstruction by exploiting complementary information from different images for shape aggregation is proposed.
Real-time high-fidelity facial performance capture
This work proposes an automatic way to detect and align the local patches required to train the regressors and run them efficiently in real-time, resulting in high-fidelity facial performance reconstruction with person-specific wrinkle details from a monocular video camera inreal-time.