Robust Egocentric Photo-realistic Facial Expression Transfer for Virtual Reality

@article{Jourabloo2021RobustEP,
  title={Robust Egocentric Photo-realistic Facial Expression Transfer for Virtual Reality},
  author={Amin Jourabloo and Fernando De la Torre and Jason M. Saragih and Shih-En Wei and Tenia Wang and Stephen Lombardi and Danielle Belko and Autumn Trimble and Hern{\'a}n Badino},
  journal={2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2021},
  pages={20291-20300}
}
Social presence, the feeling of being there with a “real” person, will fuel the next generation of communication systems driven by digital humans in virtual reality (VR). The best 3D video-realistic VR avatars that minimize the uncanny effect rely on person-specific (PS) models. However, these PS models are time-consuming to build and are typically trained with limited data variability, which results in poor generalization and robustness. Major sources of variability that affects the accuracy… 

Hooked on the metaverse? Exploring the prevalence of addiction to virtual reality applications

Similar to debates about other new media technologies in the past, with the popularization of virtual reality (VR) technologies, concerns are raised about their potential to breed media addiction. In

Leveraging Deepfakes to Close the Domain Gap between Real and Synthetic Images in Facial Capture Pipelines

This work presents a method for automatic data curation and re-trieval based on a hierarchical clustering framework typical of collision detection algorithms in traditional computer graphics pipelines, and utilizes synthetic turntables and deepfake technology to build a synthetic multi-view stereo pipeline for appearance capture.

References

SHOWING 1-10 OF 50 REFERENCES

Empirical Evaluation of Rectified Activations in Convolutional Network

The experiments suggest that incorporating a non-zero slope for negative part in rectified activation units could consistently improve the results, and are negative on the common belief that sparsity is the key of good performance in ReLU.

VR facial animation via multiview image translation

This work presents a bidirectional system that can animate avatar heads of both users' full likeness using consumer-friendly headset mounted cameras (HMC) and addresses two main challenges in doing this: unaccommodating camera views and the image-to-avatar domain gap.

Deep appearance models for face rendering

A data-driven rendering pipeline that learns a joint representation of facial geometry and appearance from a multiview capture setup and a novel unsupervised technique for mapping images to facial states results in a system that is naturally suited to real-time interactive settings such as Virtual Reality (VR).

Rectified Linear Units Improve Restricted Boltzmann Machines

Restricted Boltzmann machines were developed using binary stochastic hidden units that learn features that are better for object recognition on the NORB dataset and face verification on the Labeled Faces in the Wild dataset.

Group Normalization

Group Normalization (GN) is presented as a simple alternative to BN that can outperform its BN-based counterparts for object detection and segmentation in COCO, and for video classification in Kinetics, showing that GN can effectively replace the powerful BN in a variety of tasks.

EgoRenderer: Rendering Human Avatars from Egocentric Camera Images

Experimental evaluations show that EgoRenderer is capable of generating realistic free-viewpoint avatars of a person wearing an egocentric camera, and comparisons to several baselines demonstrate the advantages of the approach.

High-fidelity Face Tracking for AR/VR via Deep Lighting Adaptation

A deep learning lighting model is learned, that in combination with a high-quality 3D face tracking algorithm, provides a method for subtle and robust facial motion transfer from a regular video to a 3D photo-realistic avatar.

Semantic Deep Face Models

This work presents a method for nonlinear 3D face modeling using neural architectures that provides intuitive semantic control over both identity and expression by disentangling these dimensions from each other, essentially combining the benefits of both multi-linear face models and nonlinear deep face networks.

Learning an animatable detailed 3D face model from in-the-wild images

This work presents the first approach that regresses 3D face shape and animatable details that are specific to an individual but change with expression, and introduces a novel detail-consistency loss that disentangles person-specific details from expression-dependent wrinkles.

The Eyes Have It: An Integrated Eye and Face Model for Photorealistic Facial Animation

A jointly-learnable 3D face and eyeball model that better represents gaze direction and upper facial expressions, a method for disentangling the gaze of the left and right eyes from each other and the rest of the face allowing the model to represent human gaze in VR.