HeadGAN: One-shot Neural Head Synthesis and Editing

  title={HeadGAN: One-shot Neural Head Synthesis and Editing},
  author={Michail Christos Doukas and Stefanos Zafeiriou and Viktoriia Sharmanska},
  journal={2021 IEEE/CVF International Conference on Computer Vision (ICCV)},
Recent attempts to solve the problem of head reenactment using a single reference image have shown promising results. However, most of them either perform poorly in terms of photo-realism, or fail to meet the identity preservation problem, or do not fully transfer the driving pose and expression. We propose HeadGAN, a novel system that conditions synthesis on 3D face representations, which can be extracted from any driving video and adapted to the facial geometry of any reference image… 

HifiHead: One-Shot High Fidelity Neural Head Synthesis with 3D Control

HifiHead is proposed, a high fidelity neural talking head synthesis method, which can well preserve the source image's appearance and control the motion flexibly with 3D morphable face models (3DMMs) parameters derived from a driving image or indicated by users.

One-Shot Face Reenactment on Megapixels

This work presents a one-shot and high-resolution face reenactment method called MegaFR, designed to control source images with 3DMM parameters, and the proposed method can be considered a controllable StyleGAN as well as a faceReenactments method.

Free-HeadGAN: Neural Talking Head Synthesis with Explicit Gaze Control

Free-HeadGAN shows that modeling faces with sparse 3D facial landmarks arecient for achieving state-of-the-art generative performance, without relying on strong statistical priors of the face, such as 3D Morphable Models.

StyleHEAT: One-Shot High-Resolution Editable Talking Face Generation via Pre-trained StyleGAN

. One-shot talking face generation aims at synthesizing a high-quality talking face video from an arbitrary portrait image, driven by a video or an audio segment. In this work, we provide a solution

FNeVR: Neural Volume Rendering for Face Animation

A Face Neural Volume Rendering (FNeVR) network to fully explore the potential of 2D motion warping and 3D volume rendering in a unified framework is proposed and a lightweight pose editor is designed, enabling FNeVR to edit the facial pose in a simple yet effective way.

Face2Faceρ: Real-Time High-Resolution One-Shot Face Reenactment

Face2Face ρ is introduced, the first R eal-time H igh-resolution and O ne-shot (RHO, ρ) face reenactment framework which consists of two fast and efficient sub-networks, and can produce results of equal or better visual quality, yet with significantly less time and memory overhead.

Encode-in-Style: Latent-based Video Encoding using StyleGAN2

An end-to-end facial video encoding approach that facilitates data-efficient high-quality video re-synthesis by optimizing low-dimensional edits of a single Identity-latent, which economically captures face identity, head-pose, and complex facial motions at fine levels, and thereby bypasses training and person modeling.

Micro Expression Generation with Thin-plate Spline Motion Model and Face Parsing

This paper proposes an end-to-end unsupervised motion transfer network to tackle micro-expression generation and introduces face parsing method to pay specific attention to the eyeglasses regions to ensure the reasonability of the deformation.

MegaPortraits: One-shot Megapixel Neural Head Avatars

This work proposes a set of new neural architectures and training methods that can leverage both medium-resolution video data and high-resolution image data to achieve the desired levels of rendered image quality and generalization to novel views and motion.

Expressive Talking Head Generation with Granular Audio-Visual Control

The Granularly Controlled Audio-Visual Talking Heads (GC-AVT), which controls lip movements, head poses, and facial expressions of a talking head in a granular manner is proposed, to decouple the audio-visual driving sources through prior-based pre-processing designs.



Head2Head: Video-based Neural Head Synthesis

It is demonstrated that the proposed method can transfer facial expressions, pose and gaze of a source actor to a target video in a photo-realistic fashion more accurately than state-of-the-art methods.

Fast Bi-layer Neural Synthesis of One-Shot Realistic Head Avatars

A neural rendering-based system that creates head avatars from a single photograph by decomposing it into two layers that is compared to analogous state-of-the-art systems in terms of visual quality and speed.

Head2Head++: Deep Facial Attributes Re-Targeting

This work uses the 3D geometry of faces and Generative Adversarial Networks to design a novel deep learning architecture for the task of facial and head reenactment and demonstrates that the proposed method can successfully transfer facial expressions, head pose and eye gaze from a source video to a target subject, in a photo-realistic and faithful fashion, better than other state-of-the-art methods.

Deep video portraits

The first to transfer the full 3D head position, head rotation, face expression, eye gaze, and eye blinking from a source actor to a portrait video of a target actor using only an input video is presented.

Rotate-and-Render: Unsupervised Photorealistic Face Rotation From Single-View Images

This work proposes a novel unsupervised framework that can synthesize photo-realistic rotated faces using only single-view image collections in the wild, and proves that rotating faces in the 3D space back and forth and re-rendering them to the 2D plane can serve as a strong self-supervision.

Face2Face: Real-Time Face Capture and Reenactment of RGB Videos

A novel approach for real-time facial reenactment of a monocular target video sequence (e.g., Youtube video) that addresses the under-constrained problem of facial identity recovery from monocular video by non-rigid model-based bundling and re-render the manipulated output video in a photo-realistic fashion.

DeepFaceFlow: In-the-Wild Dense 3D Facial Motion Estimation

This work proposes DeepFaceFlow, a robust, fast, and highly-accurate framework for the dense estimation of 3D non-rigid facial flow between pairs of monocular images, and incorporates its framework in a full-head state-of-the-art facial video synthesis method.

Talking Face Generation by Conditional Recurrent Adversarial Network

A novel conditional video generation network where the audio input is treated as a condition for the recurrent adversarial network such that temporal dependency is incorporated to realize smooth transition for the lip and facial movement is proposed.

Hierarchical Cross-Modal Talking Face Generation With Dynamic Pixel-Wise Loss

A cascade GAN approach to generate talking face video, which is robust to different face shapes, view angles, facial characteristics, and noisy audio conditions, and compared to a direct audio-to-image approach, this approach avoids fitting spurious correlations between audiovisual signals that are irrelevant to the speech content.

Warp-guided GANs for single-photo facial animation

This paper introduces a novel method for realtime portrait animation in a single photo that factorizes out the nonlinear geometric transformations exhibited in facial expressions by lightweight 2D warps and leaves the appearance detail synthesis to conditional generative neural networks for high-fidelity facial animation generation.