HeadGAN: One-shot Neural Head Synthesis and Editing

  title={HeadGAN: One-shot Neural Head Synthesis and Editing},
  author={Michail Christos Doukas and Stefanos Zafeiriou and Viktoriia Sharmanska},
  journal={2021 IEEE/CVF International Conference on Computer Vision (ICCV)},
Recent attempts to solve the problem of head reenactment using a single reference image have shown promising results. However, most of them either perform poorly in terms of photo-realism, or fail to meet the identity preservation problem, or do not fully transfer the driving pose and expression. We propose HeadGAN, a novel system that conditions synthesis on 3D face representations, which can be extracted from any driving video and adapted to the facial geometry of any reference image… 

HifiHead: One-Shot High Fidelity Neural Head Synthesis with 3D Control

HifiHead is proposed, a high fidelity neural talking head synthesis method, which can well preserve the source image's appearance and control the motion flexibly with 3D morphable face models (3DMMs) parameters derived from a driving image or indicated by users.

One-Shot Face Reenactment on Megapixels

This work presents a one-shot and high-resolution face reenactment method called MegaFR, designed to control source images with 3DMM parameters, and the proposed method can be considered a controllable StyleGAN as well as a faceReenactments method.

Free-HeadGAN: Neural Talking Head Synthesis with Explicit Gaze Control

Free-HeadGAN shows that modeling faces with sparse 3D facial landmarks arecient for achieving state-of-the-art generative performance, without relying on strong statistical priors of the face, such as 3D Morphable Models.

FNeVR: Neural Volume Rendering for Face Animation

A Face Neural Volume Rendering (FNeVR) network to fully explore the potential of 2D motion warping and 3D volume rendering in a unified framework is proposed and a lightweight pose editor is designed, enabling FNeVR to edit the facial pose in a simple yet effective way.

Encode-in-Style: Latent-based Video Encoding using StyleGAN2

An end-to-end facial video encoding approach that facilitates data-efficient high-quality video re-synthesis by optimizing low-dimensional edits of a single Identity-latent, which economically captures face identity, head-pose, and complex facial motions at fine levels, and thereby bypasses training and person modeling.

Expressive Talking Head Generation with Granular Audio-Visual Control

The Granularly Controlled Audio-Visual Talking Heads (GC-AVT), which controls lip movements, head poses, and facial expressions of a talking head in a granular manner is proposed, to decouple the audio-visual driving sources through prior-based pre-processing designs.

MegaPortraits: One-shot Megapixel Neural Head Avatars

This work proposes a set of new neural architectures and training methods that can leverage both medium-resolution video data and high-resolution image data to achieve the desired levels of rendered image quality and generalization to novel views and motion.

StableFace: Analyzing and Improving Motion Stability for Talking Face Generation

This paper conducts systematic analyses on the motion jittering problem based on a state-of-the-art pipeline that uses 3D face representations to bridge the input audio and output video, and proposes three effective solutions to address the issue.

Identity-Referenced Deepfake Detection with Contrastive Learning

This work proposes using real images of the same identity as a reference to improve detection performance on both FaceForensics++ and Celeb-DF with relatively little training data, and achieves very competitive results on cross-manipulation and cross-dataset evaluations.

Study of detecting behavioral signatures within DeepFake videos

There is strong interest in the generation of synthetic video imagery of people talking for various purposes, including entertainment, communication, training, and advertisement. With the development



Head2Head: Video-based Neural Head Synthesis

It is demonstrated that the proposed method can transfer facial expressions, pose and gaze of a source actor to a target video in a photo-realistic fashion more accurately than state-of-the-art methods.

Fast Bi-layer Neural Synthesis of One-Shot Realistic Head Avatars

A neural rendering-based system that creates head avatars from a single photograph by decomposing it into two layers that is compared to analogous state-of-the-art systems in terms of visual quality and speed.

Head2Head++: Deep Facial Attributes Re-Targeting

This work uses the 3D geometry of faces and Generative Adversarial Networks to design a novel deep learning architecture for the task of facial and head reenactment and demonstrates that the proposed method can successfully transfer facial expressions, head pose and eye gaze from a source video to a target subject, in a photo-realistic and faithful fashion, better than other state-of-the-art methods.

Deep video portraits

The first to transfer the full 3D head position, head rotation, face expression, eye gaze, and eye blinking from a source actor to a portrait video of a target actor using only an input video is presented.

Rotate-and-Render: Unsupervised Photorealistic Face Rotation From Single-View Images

This work proposes a novel unsupervised framework that can synthesize photo-realistic rotated faces using only single-view image collections in the wild, and proves that rotating faces in the 3D space back and forth and re-rendering them to the 2D plane can serve as a strong self-supervision.

Face2Face: Real-Time Face Capture and Reenactment of RGB Videos

A novel approach for real-time facial reenactment of a monocular target video sequence (e.g., Youtube video) that addresses the under-constrained problem of facial identity recovery from monocular video by non-rigid model-based bundling and re-render the manipulated output video in a photo-realistic fashion.

DeepFaceFlow: In-the-Wild Dense 3D Facial Motion Estimation

This work proposes DeepFaceFlow, a robust, fast, and highly-accurate framework for the dense estimation of 3D non-rigid facial flow between pairs of monocular images, and incorporates its framework in a full-head state-of-the-art facial video synthesis method.

Talking Face Generation by Conditional Recurrent Adversarial Network

A novel conditional video generation network where the audio input is treated as a condition for the recurrent adversarial network such that temporal dependency is incorporated to realize smooth transition for the lip and facial movement is proposed.

Hierarchical Cross-Modal Talking Face Generation With Dynamic Pixel-Wise Loss

A cascade GAN approach to generate talking face video, which is robust to different face shapes, view angles, facial characteristics, and noisy audio conditions, and compared to a direct audio-to-image approach, this approach avoids fitting spurious correlations between audiovisual signals that are irrelevant to the speech content.

Warp-guided GANs for single-photo facial animation

This paper introduces a novel method for realtime portrait animation in a single photo that factorizes out the nonlinear geometric transformations exhibited in facial expressions by lightweight 2D warps and leaves the appearance detail synthesis to conditional generative neural networks for high-fidelity facial animation generation.