Corpus ID: 236447665

Learning Local Recurrent Models for Human Mesh Recovery

@article{Li2021LearningLR,
  title={Learning Local Recurrent Models for Human Mesh Recovery},
  author={Runze Li and Srikrishna Karanam and Ren Li and Terrence Chen and Bir Bhanu and Ziyan Wu},
  journal={ArXiv},
  year={2021},
  volume={abs/2107.12847}
}
We consider the problem of estimating frame-level full human body meshes given a video of a person with natural motion dynamics. While much progress in this field has been in single image-based mesh estimation, there has been a recent uptick in efforts to infer mesh dynamics from video given its role in alleviating issues such as depth ambiguity and occlusions. However, a key limitation of existing work is the assumption that all the observed motion dynamics can be modeled using one dynamical… Expand

Figures and Tables from this paper

References

SHOWING 1-10 OF 40 REFERENCES
Learning 3D Human Dynamics From Video
TLDR
The approach is designed so it can learn from videos with 2D pose annotations in a semi-supervised manner and obtain state-of-the-art performance on the 3D prediction task without any fine-tuning. Expand
Convolutional Mesh Regression for Single-Image Human Shape Reconstruction
TLDR
This paper addresses the problem of 3D human pose and shape estimation from a single image by proposing a graph-based mesh regression, which outperform the comparable baselines relying on model parameter regression, and achieves state-of-the-art results among model-based pose estimation approaches. Expand
VIBE: Video Inference for Human Body Pose and Shape Estimation
TLDR
This work defines a novel temporal network architecture with a self-attention mechanism and shows that adversarial training, at the sequence level, produces kinematically plausible motion sequences without in-the-wild ground-truth 3D labels. Expand
Exploiting Temporal Context for 3D Human Pose Estimation in the Wild
TLDR
A bundle-adjustment-based algorithm for recovering accurate 3D human pose and meshes from monocular videos and shows that retraining a single-frame 3D pose estimator on this data improves accuracy on both real-world and mocap data by evaluating on the 3DPW and HumanEVA datasets. Expand
RePose: Learning Deep Kinematic Priors for Fast Human Pose Estimation
TLDR
The final network effectively models the geometric prior and intuition within a lightweight deep neural network, yielding state-of-the-art results for a model of this size on two standard datasets, Leeds Sports Pose and MPII Human Pose. Expand
Learning to Reconstruct 3D Human Pose and Shape via Model-Fitting in the Loop
TLDR
The core of the proposed approach SPIN (SMPL oPtimization IN the loop) is that the two paradigms can form a strong collaboration, and better network estimates can lead the optimization to better solutions, while more accurate optimization fits provide better supervision for the network. Expand
End-to-End Recovery of Human Shape and Pose
TLDR
This work introduces an adversary trained to tell whether human body shape and pose parameters are real or not using a large database of 3D human meshes, and produces a richer and more useful mesh representation that is parameterized by shape and 3D joint angles. Expand
Learning to Estimate 3D Human Pose and Shape from a Single Color Image
TLDR
This work addresses the problem of estimating the full body 3D human pose and shape from a single color image and proposes an efficient and effective direct prediction method based on ConvNets, incorporating a parametric statistical body shape model (SMPL) within an end-to-end framework. Expand
OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields
TLDR
OpenPose is released, the first open-source realtime system for multi-person 2D pose detection, including body, foot, hand, and facial keypoints, and the first combined body and foot keypoint detector, based on an internal annotated foot dataset. Expand
Sim2real transfer learning for 3D human pose estimation: motion to the rescue
TLDR
This paper shows that standard neural-network approaches, which perform poorly when trained on synthetic RGB images, can perform well when the data is pre-processed to extract cues about the person’s motion, notably as optical flow and the motion of 2D keypoints. Expand
...
1
2
3
4
...