Real-Time Facial Segmentation and Performance Capture from RGB Input

  title={Real-Time Facial Segmentation and Performance Capture from RGB Input},
  author={Shunsuke Saito and Tianye Li and Hao Li},
  booktitle={European Conference on Computer Vision},
We introduce the concept of unconstrained real-time 3D facial performance capture through explicit semantic segmentation in the RGB input. [] Key Method Along with recent breakthroughs in deep learning, we demonstrate that pixel-level facial segmentation is possible in real-time by repurposing convolutional neural networks designed originally for general semantic segmentation.

Self-supervised CNN for Unconstrained 3D Facial Performance Capture from an RGB-D Camera

A novel method for real-time 3D facial performance capture with consumer-level RGB-D sensors that is robust to severe occlusion, fast motion, large rotation, exaggerated facial expressions and diverse lighting and augmenting the training data set in new ways is presented.

Production-level facial performance capture using deep convolutional neural networks

A real-time deep learning framework for video-based facial performance capture---the dense 3D tracking of an actor's face given a monocular video, which can drastically reduce the amount of labor involved in the development of modern narrative-driven video games or films involving realistic digital doubles of actors and potentially hours of animated dialogue per character.

Learning Dense Facial Correspondences in Unconstrained Images

This work presents a minimalists but effective neural network that computes dense facial correspondences in highly unconstrained RGB images and demonstrates successful per-frame processing under extreme pose variations, occlusions, and lighting conditions.

Real-Time 3D Facial Tracking via Cascaded Compositional Learning

The experimental results indicate that the model trained purely on synthetic facial imageries can hardly generalize well to unconstrained real-world data, and involving synthetic faces into training benefits tracking in some certain scenarios but degrades the tracking model’s generalization ability.

Accurate 3D Face Reconstruction With Weakly-Supervised Learning: From Single Image to Image Set

A novel deep 3D face reconstruction approach that leverages a robust, hybrid loss function for weakly-supervised learning which takes into account both low-level and perception-level information for supervision, and performs multi-image face reconstruction by exploiting complementary information from different images for shape aggregation is proposed.

Visual Speech-Aware Perceptual 3D Facial Expression Reconstruction from Videos

The first method for visual speech-aware perceptual reconstruction of 3D for in datasets is presented, verified through exhaustive objective evaluations on three large-scale datasets, as well as subjective evaluation with two web-based user studies.

3D facial performance capture from monocular RGB video

New methods that are targeted at 3D facial geometry reconstruction, which are more efficient than existing generic 3D geometry reconstruction methods are presented, which allow for high quality results with less constraints.

UMDFaces: An annotated face dataset for training deep networks

A new face dataset, called UMDFaces, is introduced, which has 367,888 annotated faces of 8,277 subjects and a new face recognition evaluation protocol is introduced which will help advance the state-of-the-art in this area.

Delving into High-Quality Synthetic Face Occlusion Segmentation Datasets

This study proposes two occlusion generation techniques, Naturalistic Occlusion Generation (NatOcc), for producing high-quality naturalistic synthetic occluded faces; and Random Occlusions Generation (RandOcc), a more general synthetic Occluded data generation method.

3DFaceNet: Real-time Dense Face Reconstruction via Synthesizing Photo-realistic Face Images

A novel face data generation method that renders a large number of photo-realistic face images with different attributes based on inverse rendering and proposes a coarse-to-fine learning framework consisting of three convolutional networks.



Unconstrained realtime facial performance capture

This work introduces a realtime facial tracking system specifically designed for performance capture in unconstrained settings using a consumer-level RGB-D sensor and demonstrates robust and high-fidelity facial tracking on a wide range of subjects with highly incomplete and largely occluded data.

Automatic acquisition of high-fidelity facial performances using monocular videos

A facial performance capture system that automatically captures high-fidelity facial performances using uncontrolled monocular videos and uses per-pixel shading cues to add fine-scale surface details such as emerging or disappearing wrinkles and folds into large-scale facial deformation to improve the accuracy of facial reconstruction.

Realtime facial animation with on-the-fly correctives

It is demonstrated that using an adaptive PCA model not only improves the fitting accuracy for tracking but also increases the expressiveness of the retargeted character.

Driving High-Resolution Facial Scans with Video Performance Capture

A process for rendering a realistic facial performance with control of viewpoint and illumination and optimally combines the weighted triangulation constraints, along with a shape regularization term, into a consistent 3D geometry solution over the entire performance that is drift free by construction.

Real-time high-fidelity facial performance capture

This work proposes an automatic way to detect and align the local patches required to train the regressors and run them efficiently in real-time, resulting in high-fidelity facial performance reconstruction with person-specific wrinkle details from a monocular video camera inreal-time.

Dense 3D motion capture for human faces

The adaptability of the proposed regularization scheme to nonrigid tangential motion does not hamper its robustness, since it successfully recovers the shape and motion of the cloth without overfitting it despite the absence of stretch or shear in this case.

Active appearance models with occlusion

Structured Semi-supervised Forest for Facial Landmarks Localization with Face Mask Reasoning

The proposed method improves the existing Decision Forests approaches in facial landmark localization, aided by the face mask reasoning, and yields promising results in face mask Reasoning.

Total Moving Face Reconstruction

This work presents an approach that takes a single video of a person’s face and reconstructs a high detail 3D shape for each video frame, coupled with shape from shading, using the large amounts of photos available per individual in personal or internet photo collections.

Robust Face Landmark Estimation under Occlusion

This work proposes a novel method, called Robust Cascaded Pose Regression (RCPR), which reduces exposure to outliers by detecting occlusions explicitly and using robust shape-indexed features, and shows that RCPR improves on previous landmark estimation methods on three popular face datasets.