Learning Nonparametric Human Mesh Reconstruction from a Single Image without Ground Truth Meshes

  title={Learning Nonparametric Human Mesh Reconstruction from a Single Image without Ground Truth Meshes},
  author={Kevin Lin and Lijuan Wang and Ying Jin and Zicheng Liu and Ming-Ting Sun},
Nonparametric approaches have shown promising results on reconstructing 3D human mesh from a single monocular image. Unlike previous approaches that use a parametric human model like skinned multi-person linear model (SMPL), and attempt to regress the model parameters, nonparametric approaches relax the heavy reliance on the parametric space. However, existing nonparametric methods require ground truth meshes as their regression target for each vertex, and obtaining ground truth mesh labels is… 
Leaving Flatland: Advances in 3D behavioral measurement
Continued advances at the intersection of deep learning and computer vision will facilitate 3D tracking across more anatomical features, with less training data, in additional species, and within more natural, occlusive environments.


Convolutional Mesh Regression for Single-Image Human Shape Reconstruction
This paper addresses the problem of 3D human pose and shape estimation from a single image by proposing a graph-based mesh regression, which outperform the comparable baselines relying on model parameter regression, and achieves state-of-the-art results among model-based pose estimation approaches.
Detailed Human Shape Estimation From a Single Image by Hierarchical Mesh Deformation
A novel learning-based framework that combines the robustness of parametric model with the flexibility of free-form 3D deformation and is able to restore detailed human body shapes beyond skinned models is proposed.
Pixel2Mesh: Generating 3D Mesh Models from Single RGB Images
An end-to-end deep learning architecture that produces a 3D shape in triangular mesh from a single color image by progressively deforming an ellipsoid, leveraging perceptual features extracted from the input image.
3D Hand Shape and Pose Estimation From a Single RGB Image
This work proposes a Graph Convolutional Neural Network (Graph CNN) based method to reconstruct a full 3D mesh of hand surface that contains richer information of both 3D hand shape and pose and proposes a weakly-supervised approach by leveraging the depth map as a weak supervision in training.
Learning to Estimate 3D Human Pose and Shape from a Single Color Image
This work addresses the problem of estimating the full body 3D human pose and shape from a single color image and proposes an efficient and effective direct prediction method based on ConvNets, incorporating a parametric statistical body shape model (SMPL) within an end-to-end framework.
Generating 3D faces using Convolutional Mesh Autoencoders
This work introduces a versatile model that learns a non-linear representation of a face using spectral convolutions on a mesh surface and shows that, replacing the expression space of an existing state-of-the-art face model with this model, achieves a lower reconstruction error.
End-to-End Recovery of Human Shape and Pose
This work introduces an adversary trained to tell whether human body shape and pose parameters are real or not using a large database of 3D human meshes, and produces a richer and more useful mesh representation that is parameterized by shape and 3D joint angles.
TexturePose: Supervising Human Mesh Estimation With Texture Consistency
This work proposes a natural form of supervision, that capitalizes on the appearance constancy of a person among different frames (or viewpoints) and achieves state-of-the-art results among model-based pose estimation approaches in different benchmarks.
Deformable Shape Completion with Graph Convolutional Autoencoders
This work proposes a novel learning-based method for the completion of partial shapes using a variational autoencoder with graph convolutional operations that learns a latent space for complete realistic shapes that best fits the generated shape to the known partial input.
Learning Category-Specific Mesh Reconstruction from Image Collections
A learning framework for recovering the 3D shape, camera, and texture of an object from a single image by incorporating texture inference as prediction of an image in a canonical appearance space and shows that semantic keypoints can be easily associated with the predicted shapes.