MegaPortraits: One-shot Megapixel Neural Head Avatars

  title={MegaPortraits: One-shot Megapixel Neural Head Avatars},
  author={Nikita Drobyshev and Jenya Chelishev and Taras Khakhulin and Aleksei Ivakhnenko and Victor S. Lempitsky and Egor Zakharov},
  journal={Proceedings of the 30th ACM International Conference on Multimedia},
In this work, we advance the neural head avatar technology to the megapixel resolution while focusing on the particularly challenging task of cross-driving synthesis, i.e., when the appearance of the driving image is substantially different from the animated source image. We propose a set of new neural architectures and training methods that can leverage both medium-resolution video data and high-resolution image data to achieve the desired levels of rendered image quality and generalization toโ€ฆย 

Synthesizing Photorealistic Virtual Humans Through Cross-modal Disentanglement

This work proposes an end-to-end framework for synthesizing high-quality virtual human faces capable of speech with a special emphasis on performance, and introduces a novel network utilizing visemes as an intermediate audio representation and a novel data augmentation strategy employing a hierarchical image synthesis approach.

ManVatar : Fast 3D Head Avatar Reconstruction Using Motion-Aware Neural Voxels

ManVatar is the first to decouple expression motion from canonical appearance for head avatar, and model the expression motion by neural voxels, and can recover photo-realistic head avatars in just 5 minutes, which is faster than the state-of-the-art facial reenactment methods.

MetaPortrait: Identity-Preserving Talking Head Generation with Fast Personalized Adaptation

This work proposes an ID-preserving talking head generation framework, which surpasses prior generation fidelity on established benchmarks, and adaptively fuse the source identity during synthesis, so that the network better preserves the key characteristics of the image portrait.

Progressive Disentangled Representation Learning for Fine-Grained Controllable Talking Head Synthesis

A novel one-shot talking head synthesis method that achieves disentangled and fine-grained control over lip motion, eye gaze&blink, head pose, and emotional expression and introduces motion-specific contrastive learning and regressing for non-emotional motions and feature-level decorrelation and self-reconstruction for emotional expression.

Unsupervised Volumetric Animation

This work proposes a novel approach for unsupervised 3D animation of non-rigid deformable objects, using a 3D autodecoder framework paired with a keypoint estimator via a differentiable PnP algorithm to learn the 3D structure and dynamics of objects solely from single-view RGB videos.

DINER: Depth-aware Image-based NEural Radiance fields

Given a sparse set of RGB input views, DINER achieves higher synthesis quality and can process input views with greater disparity, which allows us to capture scenes more completely without changing capturing hardware requirements and ultimately enables larger viewpoint changes during novel view synthesis.

Learning Detailed Radiance Manifolds for High-Fidelity and 3D-Consistent Portrait Synthesis from Monocular Image

This work presents a 3D-consistent novel view synthesis approach for monocular portrait images based on a recent proposed3D-aware GAN, namely Generative Radiance Manifolds (GRAM), which has shown strong 3D consistency at multiview image generation of virtual subjects via the radiance manifolds representation.

DPE: Disentanglement of Pose and Expression for General Video Portrait Editing


SoK: Data Privacy in Virtual Reality

This paper aims to systematize knowledge on the landscape of VR privacy threats and countermeasures by proposing a comprehensive taxonomy of data attributes, protections, and adversaries based on the study of 68 collected publications.

Deepfake CAPTCHA: A Method for Preventing Fake Calls

D-CAPTCHA is proposed: an active defense against real-time deepfakes that challenges the AIโ€™s ability to create content as opposed to its ability to classify content and outperforms state-of-the-art audio deepfake detectors.



Fast Bi-layer Neural Synthesis of One-Shot Realistic Head Avatars

A neural rendering-based system that creates head avatars from a single photograph by decomposing it into two layers that is compared to analogous state-of-the-art systems in terms of visual quality and speed.

Deep video portraits

The first to transfer the full 3D head position, head rotation, face expression, eye gaze, and eye blinking from a source actor to a portrait video of a target actor using only an input video is presented.

HeadGAN: One-shot Neural Head Synthesis and Editing

This work proposes HeadGAN, a novel system that conditions synthesis on 3D face representations, which can be extracted from any driving video and adapted to the facial geometry of any reference image, disentangling identity from expression.

One-Shot Free-View Neural Talking-Head Synthesis for Video Conferencing

A neural talking- head video synthesis model that learns to synthesize a talking-head video using a source image containing the target personโ€™s appearance and a driving video that dictates the motion in the output is proposed.

Dynamic Neural Radiance Fields for Monocular 4D Facial Avatar Reconstruction

This work combines a scene representation network with a low-dimensional morphable model which provides explicit control over pose and expressions and shows that this learned volumetric representation allows for photorealistic image generation that surpasses the quality of state-of-the-art video-based reenactment methods.

Face2Face: Real-Time Face Capture and Reenactment of RGB Videos

A novel approach for real-time facial reenactment of a monocular target video sequence (e.g., Youtube video) that addresses the under-constrained problem of facial identity recovery from monocular video by non-rigid model-based bundling and re-render the manipulated output video in a photo-realistic fashion.

Animating Arbitrary Objects via Deep Motion Transfer

This paper introduces a novel deep learning framework for image animation that generates a video in which the target object is animated according to the driving sequence through a deep architecture that decouples appearance and motion information.

Video-to-Video Synthesis

This paper proposes a novel video-to-video synthesis approach under the generative adversarial learning framework, capable of synthesizing 2K resolution videos of street scenes up to 30 seconds long, which significantly advances the state-of-the-art of video synthesis.

High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs

A new method for synthesizing high-resolution photo-realistic images from semantic label maps using conditional generative adversarial networks (conditional GANs) is presented, which significantly outperforms existing methods, advancing both the quality and the resolution of deep image synthesis and editing.

Perceptual Losses for Real-Time Style Transfer and Super-Resolution

This work considers image transformation problems, and proposes the use of perceptual loss functions for training feed-forward networks for image transformation tasks, and shows results on image style transfer, where aFeed-forward network is trained to solve the optimization problem proposed by Gatys et al. in real-time.