• Corpus ID: 238582746

Self-Supervised 3D Face Reconstruction via Conditional Estimation

  title={Self-Supervised 3D Face Reconstruction via Conditional Estimation},
  author={Yandong Wen and Weiyang Liu and Bhiksha Raj and Rita Singh},
  • Yandong Wen, Weiyang Liu, +1 author Rita Singh
  • Published 10 October 2021
  • Computer Science, Engineering
  • ArXiv
We present a conditional estimation (CEST) framework to learn 3D facial parameters from 2D single-view images by self-supervised training from videos. CEST is based on the process of analysis by synthesis, where the 3D facial parameters (shape, reflectance, viewpoint, and illumination) are estimated from the face image, and then recombined to reconstruct the 2D face image. In order to learn semantically meaningful 3D facial parameters without explicit access to their labels, CEST couples the… 
Synergy between 3DMM and 3D Landmarks for Accurate 3D Facial Geometry
This work studies learning from a synergy process of 3D Morphable Models (3DMM) and 3D facial landmarks to predict complete 3D facial geometry, including 3D alignment, face orientation, and 3D face


Self-Supervised Monocular 3D Face Reconstruction by Occlusion-Aware Multi-view Geometry Consistency
This work proposes an occlusion-aware view synthesis method to apply multi-view geometry consistency to self-supervised learning, and designs three novel loss functions for multi-View consistency, including the pixel consistency loss, the depth consistency lost, and the facial landmark-based epipolar loss.
Learning to Regress 3D Face Shape and Expression From an Image Without 3D Supervision
To train a network without any 2D-to-3D supervision, RingNet is presented, which learns to compute 3D face shape from a single image and achieves invariance to expression by representing the face using the FLAME model.
Self-Supervised Multi-level Face Model Learning for Monocular Reconstruction at Over 250 Hz
This first approach that jointly learns a regressor for face shape, expression, reflectance and illumination on the basis of a concurrently learned parametric face model is presented, which compares favorably to the state-of-the-art in terms of reconstruction quality, better generalizes to real world faces, and runs at over 250 Hz.
FML: Face Model Learning From Videos
This work proposes multi-frame video-based self-supervised training of a deep network that learns a face identity model both in shape and appearance while jointly learning to reconstruct 3D faces.
Large Pose 3D Face Reconstruction from a Single Image via Direct Volumetric CNN Regression
This work proposes to address many of these limitations by training a Convolutional Neural Network (CNN) on an appropriate dataset consisting of 2D images and 3D facial models or scans, and achieves this via a simple CNN architecture that performs direct regression of a volumetric representation of the3D facial geometry from a single 2D image.
On Learning 3D Face Morphable Model from In-the-Wild Images
  • Luan Tran, Xiaoming Liu
  • Computer Science, Medicine
    IEEE Transactions on Pattern Analysis and Machine Intelligence
  • 2021
This paper proposes an innovative framework to learn a nonlinear 3DMM model from a large set of in-the-wild face images, without collecting 3D face scans, and demonstrates the superior representation power of the nonlinear3DMM over its linear counterpart, and its contribution to face alignment, 3D reconstruction, and face editing.
On Learning 3 D Face Morphable Model from Inthe-wild Images
As a classic statistical model of 3D facial shape and albedo, 3D Morphable Model (3DMM) is widely used in facial analysis, e.g., model fitting, image synthesis. Conventional 3DMM is learned from a
Joint 3D Face Reconstruction and Dense Alignment with Position Map Regression Network
A straightforward method that simultaneously reconstructs the 3D facial structure and provides dense alignment and surpasses other state-of-the-art methods on both reconstruction and alignment tasks by a large margin.
3D Face Morphable Models "In-the-Wild"
This paper proposes the first, to the best of the knowledge, in-the-wild 3DMM by combining a powerful statistical model of facial shape, which describes both identity and expression, with an in- the-wild texture model, and demonstrates the first 3D facial database with relatively unconstrained conditions.
Corrective 3D reconstruction of lips from monocular video
This work quantitatively and qualitatively shows that the monocular approach reconstructs higher quality lip shapes, even for complex shapes like a kiss or lip rolling, than previous monocular approaches, and generalizes to new individuals and general scenes, enabling high-fidelity reconstruction even from commodity video footage.