Accurate 3D Hand Pose Estimation for Whole-Body 3D Human Mesh Estimation

@article{Moon2022Accurate3H,
  title={Accurate 3D Hand Pose Estimation for Whole-Body 3D Human Mesh Estimation},
  author={Gyeongsik Moon and Hongsuk Choi and Kyoung Mu Lee},
  journal={2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)},
  year={2022},
  pages={2307-2316}
}
Whole-body 3D human mesh estimation aims to reconstruct the 3D human body, hands, and face simultaneously. Although several methods have been proposed, accurate prediction of 3D hands, which consist of 3D wrist and fingers, still remains challenging due to two reasons. First, the human kinematic chain has not been carefully considered when predicting the 3D wrists. Second, previous works utilize body features for the 3D fingers, where the body feature barely contains finger information. To… 

Figures and Tables from this paper

3D Clothed Human Reconstruction in the Wild

TLDR
ClothWild is proposed, a 3D clothed human reconstruction framework that firstly addresses the robustness on in-the-wild images, and designs a DensePose-based loss function to reduce ambiguities of the weak supervision.

PyMAF-X: Towards Well-aligned Full-body Model Regression from Monocular Images

TLDR
A Pyramidal Mesh Alignment Feedback loop is proposed in the authors' regression network for well-aligned human mesh recovery and extended to PyMAF-X for the recovery of expressive full-body models to improve the mesh-image alignment and achieve new state-of-the-art results.

Benchmarking and Analyzing 3D Human Pose and Shape Estimation Beyond Algorithms

TLDR
This work presents the first large-scale benchmarking of various configurations for mesh recovery tasks from three under-explored perspectives beyond algorithms, and identifies the key strategies and remarks that can signiffcantly enhance the model performance.

NeuralAnnot: Neural Annotator for 3D Human Mesh Training Sets

TLDR
3D pseudo-GTs of NeuralAnnot, a neural network-based annotator, are shown to be highly beneficial to train the regressors of 3D human mesh regressors.

References

SHOWING 1-10 OF 41 REFERENCES

FrankMocap: A Monocular 3D Whole-Body Pose Estimation System via Regression and Integration

TLDR
This paper presents FrankMocap, a fast and accurate whole-body 3D pose estimation system that can produce 3D face, hands, and body simultaneously from in-the-wild monocular images and demonstrates that the modularized system outperforms both the optimization-based and end-to-end methods of estimating whole- body pose.

Collaborative Regression of Expressive Bodies using Moderation

TLDR
PIXIE is introduced, which produces animatable, whole-body 3D avatars with realistic facial detail, from a single image and is shown to be more accurate whole-shape and detailed face shape than the state of the art.

AGORA: Avatars in Geography Optimized for Regression Analysis

TLDR
This work introduces AGORA, a synthetic dataset with high realism and highly accurate ground truth, and evaluates existing state-of-the-art methods for 3D human pose estimation on this dataset, finding that most methods perform poorly on images of children.

Monocular Real-time Full Body Capture with Inter-part Correlations

We present the first method for real-time full body capture that estimates shape and motion of body and hands together with a dynamic 3D face model from a single color image. Our approach uses a new

FreiHAND: A Dataset for Markerless Capture of Hand Pose and Shape From Single RGB Images

TLDR
This paper introduces the first large-scale, multi-view hand dataset that is accompanied by both 3D hand pose and shape annotations and proposes an iterative, semi-automated `human-in-the-loop' approach, which includes hand fitting optimization to infer both the 3D pose andshape for each sample.

Expressive Body Capture: 3D Hands, Face, and Body From a Single Image

TLDR
This work uses the new method, SMPLify-X, to fit SMPL-X to both controlled images and images in the wild, and evaluates 3D accuracy on a new curated dataset comprising 100 images with pseudo ground-truth.

Mask R-CNN

TLDR
This work presents a conceptually simple, flexible, and general framework for object instance segmentation that outperforms all existing, single-model entries on every task, including the COCO 2016 challenge winners.

On Self-Contact and Human Pose

TLDR
This work develops new datasets and methods that significantly improve human pose estimation with self-contact and uses the datasets during SPIN training to learn a new 3D human pose regressor, called TUCH (Towards Understanding Contact in Humans).

End-to-End Human Pose and Mesh Reconstruction with Transformers

TLDR
A new method to reconstruct 3D human pose and mesh vertices from a single image using a transformer encoder to jointly model vertex-vertex and vertex-joint interactions, which generates new state-of-the-art results for human mesh reconstruction on the public Human3.6M and 3DPW datasets.

NeuralAnnot: Neural Annotator for 3D Human Mesh Training Sets

TLDR
3D pseudo-GTs of NeuralAnnot, a neural network-based annotator, are shown to be highly beneficial to train the regressors of 3D human mesh regressors.