Leveraging MoCap Data for Human Mesh Recovery

  title={Leveraging MoCap Data for Human Mesh Recovery},
  author={Fabien Baradel and Thibault Groueix and Philippe Weinzaepfel and Romain Br'egier and Yannis Kalantidis and Gr{\'e}gory Rogez},
  journal={2021 International Conference on 3D Vision (3DV)},
Training state-of-the-art models for human body pose and shape recovery from images or videos requires datasets with corresponding annotations that are really hard and expensive to obtain. Our goal in this paper is to study whether poses from 3D Motion Capture (MoCap) data can be used to improve image-based and video-based human mesh recovery methods. We find that fine-tune image-based models with synthetic renderings from MoCap data can increase their performance, by providing them with a… 


Convolutional Mesh Regression for Single-Image Human Shape Reconstruction
This paper addresses the problem of 3D human pose and shape estimation from a single image by proposing a graph-based mesh regression, which outperform the comparable baselines relying on model parameter regression, and achieves state-of-the-art results among model-based pose estimation approaches.
End-to-End Recovery of Human Shape and Pose
This work introduces an adversary trained to tell whether human body shape and pose parameters are real or not using a large database of 3D human meshes, and produces a richer and more useful mesh representation that is parameterized by shape and 3D joint angles.
AMASS: Archive of Motion Capture As Surface Shapes
AMASS is introduced, a large and varied database of human motion that unifies 15 different optical marker-based mocap datasets by representing them within a common framework and parameterization and makes it readily useful for animation, visualization, and generating training data for deep learning.
MoCap-guided Data Augmentation for 3D Pose Estimation in the Wild
This paper introduces an image-based synthesis engine that artificially augments a dataset of real images with 2D human pose annotations using 3D Motion Capture (MoCap) data to generate a large set of photorealistic synthetic images of humans with 3D pose annotations.
Learning to Reconstruct 3D Human Pose and Shape via Model-Fitting in the Loop
The core of the proposed approach SPIN (SMPL oPtimization IN the loop) is that the two paradigms can form a strong collaboration, and better network estimates can lead the optimization to better solutions, while more accurate optimization fits provide better supervision for the network.
SMPLy Benchmarking 3D Human Pose Estimation in the Wild
A pipeline to easily produce and validate such a dataset with accurate ground-truth, with which to benchmark recent 3D human pose estimation methods in-the-wild on the recently introduced Mannequin Challenge dataset is presented.
Neural Body Fitting: Unifying Deep Learning and Model Based Human Pose and Shape Estimation
A novel approach (Neural Body Fitting (NBF) is proposed that integrates a statistical body model as a layer within a CNN leveraging both reliable bottom-up body part segmentation and robust top-down body model constraints.
TexturePose: Supervising Human Mesh Estimation With Texture Consistency
This work proposes a natural form of supervision, that capitalizes on the appearance constancy of a person among different frames (or viewpoints) and achieves state-of-the-art results among model-based pose estimation approaches in different benchmarks.
BodyNet: Volumetric Inference of 3D Human Body Shapes
BodyNet is an end-to-end trainable network that benefits from a volumetric 3D loss, a multi-view re-projection loss, and intermediate supervision of 2D pose, 2D body part segmentation, and 3D pose and achieves state-of-the-art performance.
Delving Deep Into Hybrid Annotations for 3D Human Recovery in the Wild
This work focuses on the challenging task of in-the-wild 3D human recovery from single images when paired 3D annotations are not fully available, and shows that incorporating dense correspondence into in- the- wild 3Dhuman recovery is promising and competitive due to its high efficiency and relatively low annotating cost.