Head2Head++: Deep Facial Attributes Re-Targeting

  title={Head2Head++: Deep Facial Attributes Re-Targeting},
  author={Michail Christos Doukas and Mohammad Rami Koujan and Viktoriia Sharmanska and Anastasios Roussos and Stefanos Zafeiriou},
  journal={IEEE Transactions on Biometrics, Behavior, and Identity Science},
Facial video re-targeting is a challenging problem aiming to modify the facial attributes of a target subject in a seamless manner by a driving monocular sequence. We leverage the 3D geometry of faces and Generative Adversarial Networks (GANs) to design a novel deep learning architecture for the task of facial and head reenactment. Our method is different to purely 3D model-based approaches, or recent image-based methods that use Deep Convolutional Neural Networks (DCNNs) to generate individual… 

Neural Sign Reenactor: Deep Photorealistic Sign Language Retargeting

This paper introduces a neural rendering pipeline for transferring the facial expressions, head pose and body movements of one person in a source video to another in a target video, and yields promising results of unprecedented realism for Anonymization.

HeadGAN: One-shot Neural Head Synthesis and Editing

This work proposes HeadGAN, a novel system that conditions synthesis on 3D face representations, which can be extracted from any driving video and adapted to the facial geometry of any reference image, disentangling identity from expression.

Free-HeadGAN: Neural Talking Head Synthesis with Explicit Gaze Control

Free-HeadGAN shows that modeling faces with sparse 3D facial landmarks arecient for achieving state-of-the-art generative performance, without relying on strong statistical priors of the face, such as 3D Morphable Models.

Neural Emotion Director: Speech-preserving semantic control of facial expressions in “in-the-wild” videos

This method is the first to be capable of controlling the actor’s facial expressions by even using as a sole input the semantic labels of the manipulated emotions, while at the same time preserving the speech-related lip movements.

RigNeRF: Fully Controllable Neural 3D Portraits

  • ShahRukh Athar
  • Computer Science
    2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
  • 2022
This work proposes RigNeRF, a system that goes beyond just novel view synthesis and enables full control of head pose and facial expressions learned from a single portrait video and demonstrates the effectiveness of the method on free view synthesis of a portrait scene with explicitHead pose and expression controls.

Progressive Transformer Machine for Natural Character Reenactment

A progressive transformer module is designed that introduces multi-head self-attention with convolution refinement to simultaneously capture global-local dependencies and can capture glocal-local information to facilitate generating video frames that are globally natural while preserving sharp outlines and rich detail information.

Deep Semantic Manipulation of Facial Videos

This paper proposes the first method to perform photorealistic manipulation of facial expressions in videos based on a disentangled representation and estimation of the 3D facial shape and activity, and introduces a user-friendly, interactive AI tool that processes human-readable semantic labels about the desired emotion manipulations in specific parts of the input video and synthesizes photorealism manipulated videos.

Deep Person Generation: A Survey from the Perspective of Face, Pose and Cloth Synthesis

The scope of person generation is summarized, and a systematically review recent progress and technical trends in deep person generation are reviewed, covering three major tasks: talking-head generation (face), pose-guided person generation (pose) and garment-oriented persongeneration (cloth).

Enriching Facial Anti-Spoofing Datasets via an Effective Face Swapping Framework

An effective face swapping framework based on StyleGAN is proposed that can effectively complete face swapping and provide high-quality data for deep forgery detection to ensure the security of multimedia systems.

FLAME-in-NeRF : Neural control of Radiance Fields for Free View Face Animation

This work designs a system that enables both novel view synthesis for portrait video, including the human subject and the scene background, and explicit control of the facial expressions through a low-dimensional expression representation, and imposes a spatial prior brought by 3DMM fitting to guide the network to learn disentangled control for scene appearance and facial actions.



DeepFaceFlow: In-the-Wild Dense 3D Facial Motion Estimation

This work proposes DeepFaceFlow, a robust, fast, and highly-accurate framework for the dense estimation of 3D non-rigid facial flow between pairs of monocular images, and incorporates its framework in a full-head state-of-the-art facial video synthesis method.

Head2Head: Video-based Neural Head Synthesis

It is demonstrated that the proposed method can transfer facial expressions, pose and gaze of a source actor to a target video in a photo-realistic fashion more accurately than state-of-the-art methods.

Deep video portraits

The first to transfer the full 3D head position, head rotation, face expression, eye gaze, and eye blinking from a source actor to a portrait video of a target actor using only an input video is presented.

Face2Face: Real-Time Face Capture and Reenactment of RGB Videos

A novel approach for real-time facial reenactment of a monocular target video sequence (e.g., Youtube video) that addresses the under-constrained problem of facial identity recovery from monocular video by non-rigid model-based bundling and re-render the manipulated output video in a photo-realistic fashion.

FML: Face Model Learning From Videos

This work proposes multi-frame video-based self-supervised training of a deep network that learns a face identity model both in shape and appearance while jointly learning to reconstruct 3D faces.

Joint Face Detection and Alignment Using Multitask Cascaded Convolutional Networks

A deep cascaded multitask framework that exploits the inherent correlation between detection and alignment to boost up their performance and achieves superior accuracy over the state-of-the-art techniques on the challenging face detection dataset and benchmark.

Self-Supervised Multi-level Face Model Learning for Monocular Reconstruction at Over 250 Hz

This first approach that jointly learns a regressor for face shape, expression, reflectance and illumination on the basis of a concurrently learned parametric face model is presented, which compares favorably to the state-of-the-art in terms of reconstruction quality, better generalizes to real world faces, and runs at over 250 Hz.

ReenactNet: Real-time Full Head Reenactment

This work proposes a head-to-head system of their own implementation capable of fully transferring the human head 3D pose, facial expressions and eye gaze from a source to a target actor, while preserving the identity of the target actor.

FaceForensics++: Learning to Detect Manipulated Facial Images

This paper proposes an automated benchmark for facial manipulation detection, and shows that the use of additional domain-specific knowledge improves forgery detection to unprecedented accuracy, even in the presence of strong compression, and clearly outperforms human observers.

Real-time Facial Expression Recognition “In The Wild” by Disentangling 3D Expression from Identity

An extensive experimental evaluation shows that the proposed method outperforms the compared techniques in estimating the 3D expression parameters and achieves state-of-the-art performance in recognising the basic emotions from facial images, as well as recognising stress from facial videos.