• Corpus ID: 238856950

Playing for 3D Human Recovery

@article{Cai2021PlayingF3,
  title={Playing for 3D Human Recovery},
  author={Zhongang Cai and Mingyuan Zhang and Jiawei Ren and Chen Wei and Daxuan Ren and Jiatong Li and Zhengyu Lin and Haiyu Zhao and Shuai Yi and Lei Yang and Chen Change Loy and Ziwei Liu},
  journal={ArXiv},
  year={2021},
  volume={abs/2110.07588}
}
Imageand video-based 3D human recovery (i.e. pose and shape estimation) have achieved substantial progress. However, due to the prohibitive cost of motion capture, existing datasets are often limited in scale and diversity, which hinders the further development of more powerful models. In this work, we obtain massive human sequences as well as their 3D ground truths by playing video games. Specifically, we contribute, GTA-Human, a mega-scale and highly-diverse 3D human dataset generated with… 
Recovering 3D Human Mesh from Monocular Images: A Survey
TLDR
This is the first survey to focus on the task of monocular 3D human mesh recovery and starts with the introduction of body models and then elaborate recovery frameworks and training objectives by providing in-depth analyses of their strengths and weaknesses.
HuMMan: Multi-Modal 4D Human Dataset for Versatile Sensing and Modeling
TLDR
HuMMan is a large-scale multi-modal 4D human dataset with 1000 human subjects, 400k sequences and 60M frames that voice the need for further study on challenges such as Ne-grained action recognition, dynamic human mesh reconstruction, and textured mesh reconstruction.
AvatarCLIP: Zero-Shot Text-Driven Generation and Animation of 3D Avatars
TLDR
By leveraging the priors learned in the motion VAE, a CLIP-guided reference-based motion synthesis method is proposed for the animation of the generated 3D avatar, which validate the effectiveness and generalizability of texture generation.

References

SHOWING 1-10 OF 67 REFERENCES
SMPLy Benchmarking 3D Human Pose Estimation in the Wild
TLDR
A pipeline to easily produce and validate such a dataset with accurate ground-truth, with which to benchmark recent 3D human pose estimation methods in-the-wild on the recently introduced Mannequin Challenge dataset is presented.
Sim2real transfer learning for 3D human pose estimation: motion to the rescue
TLDR
This paper shows that standard neural-network approaches, which perform poorly when trained on synthetic RGB images, can perform well when the data is pre-processed to extract cues about the person’s motion, notably as optical flow and the motion of 2D keypoints.
Delving Deep Into Hybrid Annotations for 3D Human Recovery in the Wild
TLDR
This work focuses on the challenging task of in-the-wild 3D human recovery from single images when paired 3D annotations are not fully available, and shows that incorporating dense correspondence into in- the- wild 3Dhuman recovery is promising and competitive due to its high efficiency and relatively low annotating cost.
VIBE: Video Inference for Human Body Pose and Shape Estimation
TLDR
This work defines a novel temporal network architecture with a self-attention mechanism and shows that adversarial training, at the sequence level, produces kinematically plausible motion sequences without in-the-wild ground-truth 3D labels.
Chasing the Tail in Monocular 3D Human Reconstruction With Prototype Memory
TLDR
This work proposes a prototype memory-augmented network, PM-Net, that effectively improves performances of predicting rare poses and significantly improves the models’ performances on rare poses while generating comparable results on other samples.
Learning 3D Human Dynamics From Video
TLDR
The approach is designed so it can learn from videos with 2D pose annotations in a semi-supervised manner and obtain state-of-the-art performance on the 3D prediction task without any fine-tuning.
Monocular 3D Human Pose Estimation in the Wild Using Improved CNN Supervision
We propose a CNN-based approach for 3D human body pose estimation from single RGB images that addresses the issue of limited generalizability of models trained solely on the starkly limited publicly
Exploiting Temporal Context for 3D Human Pose Estimation in the Wild
TLDR
A bundle-adjustment-based algorithm for recovering accurate 3D human pose and meshes from monocular videos and shows that retraining a single-frame 3D pose estimator on this data improves accuracy on both real-world and mocap data by evaluating on the 3DPW and HumanEVA datasets.
Towards Accurate Marker-Less Human Shape and Pose Estimation over Time
  • Yinghao Huang
  • Computer Science
    2017 International Conference on 3D Vision (3DV)
  • 2017
TLDR
This work presents a fully automatic method that, given multi-view videos, estimates 3D human pose and body shape and takes the recently proposed SMPLify method as the base method and extends it in several ways.
Human3.6M: Large Scale Datasets and Predictive Methods for 3D Human Sensing in Natural Environments
We introduce a new dataset, Human3.6M, of 3.6 Million accurate 3D Human poses, acquired by recording the performance of 5 female and 6 male subjects, under 4 different viewpoints, for training
...
...