Procedural Humans for Computer Vision

  title={Procedural Humans for Computer Vision},
  author={Charlie Hewitt and Tadas Baltru{\vs}aitis and Erroll Wood and Lohit Petikam and Louis Florentin and Hanz Cuevas Velasquez},
Recent work has shown the benefits of synthetic data for use in computer vision, with applications ranging from autonomous driving [17, 18] to face landmark detection [20] and reconstruction [19]. There are a number of benefits of using synthetic data from privacy preservation and bias elimination [1, 12] to quality and feasibility of annotation [19]. Generating human-centered synthetic data is a particular challenge in terms of realism and domain-gap, though recent work has shown that effective… 



Fake it till you make it: face analysis in the wild using synthetic data alone

It is shown that it is possible to synthesize data with minimal domain gap, so that models trained on synthetic data generalize to real in-the-wild datasets and train machine learning systems for face-related tasks such as landmark localization and face parsing.

DigiFace-1M: 1 Million Digital Face Images for Face Recognition

This work introduces a large-scale synthetic dataset for face recognition, obtained by rendering digital faces using a computer graphics pipeline and demonstrates that aggressive data augmentation can significantly reduce the synthetic-to-real domain gap.

AMASS: Archive of Motion Capture As Surface Shapes

AMASS is introduced, a large and varied database of human motion that unifies 15 different optical marker-based mocap datasets by representing them within a common framework and parameterization and makes it readily useful for animation, visualization, and generating training data for deep learning.

Monocular Expressive Body Regression through Body-Driven Attention

ExPose estimates expressive 3D humans more accurately than existing optimization methods at a small fraction of the computational cost by introducing ExPose (EXpressive POse and Shape rEgression), which directly regresses the body, face, and hands, in SMPL-X format, from an RGB image.

Expressive Body Capture: 3D Hands, Face, and Body From a Single Image

This work uses the new method, SMPLify-X, to fit SMPL-X to both controlled images and images in the wild, and evaluates 3D accuracy on a new curated dataset comprising 100 images with pseudo ground-truth.

3D face reconstruction with dense landmarks

This work presents the first method that accurately predicts 10 × as many landmarks as usual, covering the whole head, including the eyes and teeth, using synthetic training data, and achieves state-of-the-art results for monocular 3D face reconstruction in the wild.

Learning to Dress 3D People in Generative Clothing

This work learns a generative 3D mesh model of clothed people from 3D scans with varying pose and clothing, and is the first generative model that directly dresses 3D human body meshes and generalizes to different poses.

Synthetic Data for Multi-Parameter Camera-Based Physiological Sensing

This work leverages a high-fidelity synthetics pipeline for generating videos of faces with faithful blood flow and breathing patterns and presents systematic experiments showing how physiologically-grounded synthetic data can be used in training camera-based multi-parameter cardiopulmonary sensing.

MoSh: motion and shape capture from sparse markers

This work illustrates MoSh by recovering body shape, pose, and soft-tissue motion from archival mocap data and using this to produce animations with subtlety and realism and shows how to magnify the 3D deformations of soft tissue to create animations with appealing exaggerations.

SMPL: a skinned multi-person linear model

The Skinned Multi-Person Linear model (SMPL) is a skinned vertex-based model that accurately represents a wide variety of body shapes in natural human poses that is compatible with existing graphics pipelines and iscompatible with existing rendering engines.