NeuralAnnot: Neural Annotator for 3D Human Mesh Training Sets

@article{Moon2022NeuralAnnotNA,
  title={NeuralAnnot: Neural Annotator for 3D Human Mesh Training Sets},
  author={Gyeongsik Moon and Hongsuk Choi and Kyoung Mu Lee},
  journal={2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)},
  year={2022},
  pages={2298-2306}
}
Most 3D human mesh regressors are fully supervised with 3D pseudo-GT human model parameters and weakly supervised with GT 2D/3D joint coordinates as the 3D pseudo-GTs bring great performance gain. The 3D pseudo-GTs are obtained by annotators, systems that iteratively fit 3D human model parameters to GT 2D/3D joint coordinates of training sets in the pre-processing stage of the regressors. The fitted 3D parameters at the last fitting iteration become the 3D pseudo-GTs, used to fully super-vise… 

Figures and Tables from this paper

PyMAF-X: Towards Well-aligned Full-body Model Regression from Monocular Images

TLDR
A Pyramidal Mesh Alignment Feedback loop is proposed in the authors' regression network for well-aligned human mesh recovery and extended to PyMAF-X for the recovery of expressive full-body models to improve the mesh-image alignment and achieve new state-of-the-art results.

Accurate 3D Hand Pose Estimation for Whole-Body 3D Human Mesh Estimation

TLDR
This work designs Pose2Pose, a module that utilizes joint features for 3D joint rotations in the human kinematic chain and presents Hand4Whole, which has two strong points over previous works.

References

SHOWING 1-10 OF 34 REFERENCES

Exemplar Fine-Tuning for 3D Human Model Fitting Towards In-the-Wild 3D Human Pose Estimation

TLDR
This paper augments existing 2D datasets with high-quality 3D pose fits by augmenting them with Exemplar Fine-Tuning (EFT), and shows that EFT produces 3D annotations that result in better downstream performance and are qualitatively preferable in an extensive human-based assessment.

Keep It SMPL: Automatic Estimation of 3D Human Pose and Shape from a Single Image

TLDR
The first method to automatically estimate the 3D pose of the human body as well as its 3D shape from a single unconstrained image is described, showing superior pose accuracy with respect to the state of the art.

End-to-End Recovery of Human Shape and Pose

TLDR
This work introduces an adversary trained to tell whether human body shape and pose parameters are real or not using a large database of 3D human meshes, and produces a richer and more useful mesh representation that is parameterized by shape and 3D joint angles.

I2L-MeshNet: Image-to-Lixel Prediction Network for Accurate 3D Human Pose and Mesh Estimation from a Single RGB Image

TLDR
The proposed I2L-MeshNet predicts the per-lixel likelihood on 1D heatmaps for each mesh vertex coordinate instead of directly regressing the parameters, and preserves the spatial relationship in the input image and models the prediction uncertainty.

Learning to Reconstruct 3D Human Pose and Shape via Model-Fitting in the Loop

TLDR
The core of the proposed approach SPIN (SMPL oPtimization IN the loop) is that the two paradigms can form a strong collaboration, and better network estimates can lead the optimization to better solutions, while more accurate optimization fits provide better supervision for the network.

Convolutional Mesh Regression for Single-Image Human Shape Reconstruction

TLDR
This paper addresses the problem of 3D human pose and shape estimation from a single image by proposing a graph-based mesh regression, which outperform the comparable baselines relying on model parameter regression, and achieves state-of-the-art results among model-based pose estimation approaches.

Expressive Body Capture: 3D Hands, Face, and Body From a Single Image

TLDR
This work uses the new method, SMPLify-X, to fit SMPL-X to both controlled images and images in the wild, and evaluates 3D accuracy on a new curated dataset comprising 100 images with pseudo ground-truth.

AGORA: Avatars in Geography Optimized for Regression Analysis

TLDR
This work introduces AGORA, a synthetic dataset with high realism and highly accurate ground truth, and evaluates existing state-of-the-art methods for 3D human pose estimation on this dataset, finding that most methods perform poorly on images of children.

Monocular 3D Human Pose Estimation in the Wild Using Improved CNN Supervision

We propose a CNN-based approach for 3D human body pose estimation from single RGB images that addresses the issue of limited generalizability of models trained solely on the starkly limited publicly

FreiHAND: A Dataset for Markerless Capture of Hand Pose and Shape From Single RGB Images

TLDR
This paper introduces the first large-scale, multi-view hand dataset that is accompanied by both 3D hand pose and shape annotations and proposes an iterative, semi-automated `human-in-the-loop' approach, which includes hand fitting optimization to infer both the 3D pose andshape for each sample.