MonoCap: Monocular Human Motion Capture using a CNN Coupled with a Geometric Prior
@article{Zhou2017MonoCapMH, title={MonoCap: Monocular Human Motion Capture using a CNN Coupled with a Geometric Prior}, author={Xiaowei Zhou and Menglong Zhu and Georgios Pavlakos and Spyridon Leonardos and Konstantinos G. Derpanis and Kostas Daniilidis}, journal={IEEE Transactions on Pattern Analysis and Machine Intelligence}, year={2017}, volume={41}, pages={901-914} }
Recovering 3D full-body human pose is a challenging problem with many applications. [] Key Method We introduce a novel approach that treats 2D joint locations as latent variables whose uncertainty distributions are given by a deep fully convolutional neural network.
Figures and Tables from this paper
170 Citations
Real‐time 3D human pose and motion reconstruction from monocular RGB videos
- Computer ScienceComput. Animat. Virtual Worlds
- 2019
This work presents a method that captures and reconstructs the 3D skeletal pose and motion articulation of multiple characters using a monocular RGB camera, taking advantage of the recent development in deep learning that allows two‐dimensional (2D) pose estimation of several characters and the increasing availability of motion capture data.
On the role of depth predictions for 3D human pose estimation
- Computer ScienceFTC
- 2022
This work builds a system that takes 2d joint locations as input along with their estimated depth value and predicts their 3d positions in camera coordinates and explains how the state-of-the-art results on the H3.6M validation set are due to the additional input of depth.
SDM3d: shape decomposition of multiple geometric priors for 3D pose estimation
- Computer ScienceNeural Computing and Applications
- 2020
SDM3d makes a new attempt by separating a 3D pose into the global structure and body deformations that are encoded explicitly via different priors constraints, and a joint learning strategy is designed to learn two over-complete dictionaries from training data to capture more geometric priors information.
Ordinal Depth Supervision for 3D Human Pose Estimation
- Computer Science2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition
- 2018
This work proposes to use a weaker supervision signal provided by the ordinal depths of human joints, which achieves new state-of-the-art performance for the relevant benchmarks and validate the effectiveness of ordinal depth supervision for 3D human pose.
Deep Monocular 3D Human Pose Estimation via Cascaded Dimension-Lifting
- Computer ScienceArXiv
- 2021
This work decomposes the task of lifting pose from 2D image space to 3D spatial space into several sequential sub-tasks, 1) kinematic skeletons & individual joints estimation in 2D space, 2) rootrelative depth estimation, and 3) lifting to the 3D space which employs direct supervisions and contextual image features to guide the learning process.
Synthetic Training for Monocular Human Mesh Recovery
- Computer ScienceArXiv
- 2020
A depth-to-scale (D2S) projection to incorporate the depth difference into the projection function to derive per-joint scale variants for more proper supervision of 3D human mesh recovery from monocular images.
Weakly-Supervised Discovery of Geometry-Aware Representation for 3D Human Pose Estimation
- Computer Science2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
- 2019
A geometry-aware 3D representation for the human pose is proposed to address this limitation by using multiple views in a simple auto-encoder model at the training stage and only 2D keypoint information as supervision, and injecting the representation as a robust 3D prior.
Reweighted sparse representation with residual compensation for 3D human pose estimation from a single RGB image
- Computer ScienceNeurocomputing
- 2019
Can 3D Pose be Learned from 2D Projections Alone?
- Computer ScienceECCV Workshops
- 2018
This work proposes a novel Random Projection layer, which randomly projects the generated 3D skeleton and sends the resulting 2D pose to the discriminator, utilizing an adversarial framework to impose a prior on the 3D structure, learned solely from their random 2D projections.
MotioNet: 3D Human Motion Reconstruction from Monocular Video with Skeleton Consistency
- Computer ScienceACM Trans. Graph.
- 2021
MotioNet, a deep neural network that directly reconstructs the motion of a 3D human skeleton from monocular video, is introduced, the first data-driven approach that directly outputs a kinematic skeleton, which is a complete, commonly used, motion representation.
76 References
Sparseness Meets Deepness: 3D Human Pose Estimation from Monocular Video
- Computer Science2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
- 2016
This paper addresses the challenge of 3D full-body human pose estimation from a monocular image sequence with a novel approach that integrates a sparsity-driven 3D geometric prior and temporal smoothness and outperforms a publicly available 2D pose estimation baseline on the challenging PennAction dataset.
Marker-Less 3D Human Motion Capture with Monocular Image Sequence and Height-Maps
- Computer ScienceECCV
- 2016
This work introduces the additional built-in knowledge, namely height-map, into the algorithmic scheme of reconstructing the 3D pose/motion under a single-view calibrated camera, and forms a new objective function to estimate 3D motion from the detected 2D joints in the monocular image sequence.
Unconstrained Monocular 3D Human Pose Estimation by Action Detection and Cross-Modality Regression Forest
- Computer Science2013 IEEE Conference on Computer Vision and Pattern Recognition
- 2013
A framework which applies action detection and 2D pose estimation techniques to infer 3D poses in an unconstrained video, which demonstrated promising results, significantly outperforming the relevant state-of-the-arts.
Keep It SMPL: Automatic Estimation of 3D Human Pose and Shape from a Single Image
- Computer ScienceECCV
- 2016
The first method to automatically estimate the 3D pose of the human body as well as its 3D shape from a single unconstrained image is described, showing superior pose accuracy with respect to the state of the art.
3D Human Pose Estimation from Monocular Images with Deep Convolutional Neural Network
- Computer ScienceACCV
- 2014
A deep convolutional neural network for 3D human pose estimation from monocular images is proposed and empirically show that the network has disentangled the dependencies among different body parts, and learned their correlations.
MoCap-guided Data Augmentation for 3D Pose Estimation in the Wild
- Computer ScienceNIPS
- 2016
This paper introduces an image-based synthesis engine that artificially augments a dataset of real images with 2D human pose annotations using 3D Motion Capture (MoCap) data to generate a large set of photorealistic synthetic images of humans with 3D pose annotations.
Synthesizing Training Images for Boosting Human 3D Pose Estimation
- Computer Science2016 Fourth International Conference on 3D Vision (3DV)
- 2016
It is shown that pose space coverage and texture diversity are the key ingredients for the effectiveness of synthetic training data and CNNs trained with the authors' synthetic images out-perform those trained with real photos on 3D pose estimation tasks.
Flowing ConvNets for Human Pose Estimation in Videos
- Computer Science2015 IEEE International Conference on Computer Vision (ICCV)
- 2015
This work proposes a ConvNet architecture that is able to benefit from temporal context by combining information across the multiple frames using optical flow and outperforms a number of others, including one that uses optical flow solely at the input layers, one that regresses joint coordinates directly, and one that predicts heatmaps without spatial fusion.
3D Human Pose Estimation Using Convolutional Neural Networks with 2D Pose Information
- Computer ScienceECCV Workshops
- 2016
This paper tackles the 3D human pose estimation task with end-to-end learning using CNNs and finds that more accurate 3D poses are obtained by combining information on relative positions with respect to multiple joints, instead of just one root joint.
Pose-conditioned joint angle limits for 3D human pose reconstruction
- Computer Science2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
- 2015
A general parametrization of body pose is defined and a new, multi-stage, method to estimate 3D pose from 2D joint locations using an over-complete dictionary of poses is defined that shows good generalization while avoiding impossible poses.