Realistic Face Reenactment via Self-Supervised Disentangling of Identity and Pose

  title={Realistic Face Reenactment via Self-Supervised Disentangling of Identity and Pose},
  author={Xianfang Zeng and Yusu Pan and Mengmeng Wang and Jiangning Zhang and Yong Liu},
Recent works have shown how realistic talking face images can be obtained under the supervision of geometry guidance, e.g., facial landmark or boundary. To alleviate the demand for manual annotations, in this paper, we propose a novel self-supervised hybrid model (DAE-GAN) that learns how to reenact face naturally given large amounts of unlabeled videos. Our approach combines two deforming autoencoders with the latest advances in the conditional generation. On the one hand, we adopt the… 

Figures and Tables from this paper

FaceController: Controllable Attribute Editing for Face in the Wild
In this work, a simple feed-forward network is proposed to generate high-fidelity manipulated faces with one or multiple desired face attributes manipulated while other details are preserved by simply employing some existing and easy-obtainable prior information.
SelFSR: Self-Conditioned Face Super-Resolution in the Wild via Flow Field Degradation Network
A novel domain-adaptive degradation network for face super-resolution in the wild that achieves state-of-the-art performance on both CelebA and real-world face dataset and presents the self-conditioned block for super- resolution network.
End-to-End 3D Facial Shape Reconstruction From an Unconstrained Image
A novel architecture is proposed to learn face models in nonlinear spaces using a Convolutional Neural Networks based End-to-End 3D Facial Shape Reconstruction from a single-view 2D image scheme to improve the results of reconstructed face model.
A comprehensive survey on semantic facial attribute editing using generative adversarial networks
This paper surveys the recent works and advances in semantic facial attribute editing and covers all related aspects of these models including the related definitions and concepts, architectures, loss functions, datasets, evaluation metrics, and applications.
Deep Person Generation: A Survey from the Perspective of Face, Pose and Cloth Synthesis
The scope of person generation is summarized, and a systematically review recent progress and technical trends in deep person generation are reviewed, covering three major tasks: talking-head generation (face), pose-guided person generation (pose) and garment-oriented persongeneration (cloth).
Test-Time Personalization with a Transformer for Human Pose Estimation
This work proposes to personalize a 2D human pose estimator given a set of test images of a person without using any manual annotations and improves the pose by transforming the updated self-supervised keypoints.
One-shot Talking Face Generation from Single-speaker Audio-Visual Correlation Learning
An Audio-Visual Correlation Transformer (AVCT) is developed that aims to infer talking motions represented by keypoint based dense motion fields from an input audio and can inherently generalize to audio spoken by other identities.
Region-Aware Face Swapping
A novel Region-Aware Face Swapping (RAFSwap) network is presented to achieve identity-consistent harmonious high-resolution face generation in a local-global manner and a Face Mask Predictor module incorporated with StyleGAN2 is proposed to predict identityrelevant soft facial masks in an unsupervised manner that is more practical for generating harmonioushigh-resolution faces.
Write-a-speaker: Text-based Emotional and Rhythmic Talking-head Generation
A novel text-based talking-head video generation framework that synthesizes high-fidelity facial expressions and head motions in accordance with contextual sentiments as well as speech rhythm and pauses and employs time-aligned texts instead of acoustics features to alleviate the timbre gap issue.
3D Face Reconstruction in Deep Learning Era: A Survey
  • Sahil Sharma, Vijay Kumar
  • Computer Science
    Archives of computational methods in engineering : state of the art reviews
  • 2022
An in-depth analysis of 3D face reconstruction using deep learning techniques and the performance analysis of different face reconstruction techniques in terms of software, hardware, pros and cons is provided.


FaceSwapNet: Landmark Guided Many-to-Many Face Reenactment
A novel many-to-many face reenactment framework, named FaceSwapNet, which allows transferring facial expressions and movements from one source face to arbitrary targets and a novel triplet perceptual loss is proposed to force the generator to learn geometry and appearance information simultaneously.
Self-supervised learning of a facial attribute embedding from video
A network is introduced that is trained to embed multiple frames from the same video face-track into a common low-dimensional space and learns a meaningful face embedding that encodes information about head pose, facial landmarks and facial expression, without having been supervised with any labelled data.
MoFA: Model-Based Deep Convolutional Face Autoencoder for Unsupervised Monocular Reconstruction
A novel model-based deep convolutional autoencoder that addresses the highly challenging problem of reconstructing a 3D human face from a single in-the-wild color image and can be trained end-to-end in an unsupervised manner, which renders training on very large real world data feasible.
ReenactGAN: Learning to Reenact Faces via Boundary Transfer
The proposed method, known as ReenactGAN, is capable of transferring facial movements and expressions from an arbitrary person's monocular video input to a target person’s video, and can perform photo-realistic face reenactment.
Deforming Autoencoders: Unsupervised Disentangling of Shape and Appearance
A more powerful form of unsupervised disentangling becomes possible in template coordinates, allowing us to successfully decompose face images into shading and albedo, and further manipulate face images.
Self-Supervised Representation Learning From Videos for Facial Action Unit Detection
Experimental results demonstrate that the learned representation is discriminative for AU detection, where TCAE outperforms or is comparable with the state-of-the-art self-supervised learning methods and supervised AU detection methods.
Face2Face: real-time face capture and reenactment of RGB videos
A novel approach for real-time facial reenactment of a monocular target video sequence (e.g., Youtube video) that addresses the under-constrained problem of facial identity recovery from monocular video by non-rigid model-based bundling and re-render the manipulated output video in a photo-realistic fashion.
Few-Shot Adversarial Learning of Realistic Neural Talking Head Models
This work presents a system that performs lengthy meta-learning on a large dataset of videos, and is able to frame few- and one-shot learning of neural talking head models of previously unseen people as adversarial training problems with high capacity generators and discriminators.
PFLD: A Practical Facial Landmark Detector
This paper investigates a neat model with promising detection accuracy under wild environments e.g., unconstrained pose, expression, lighting, and occlusion conditions) and super real-time speed on a mobile device.
Photorealistic Facial Texture Inference Using Deep Neural Networks
A data-driven inference method is presented that can synthesize a photorealistic texture map of a complete 3D face model given a partial 2D view of a person in the wild and successful face reconstructions from a wide range of low resolution input images are demonstrated.