MonoCap: Monocular Human Motion Capture using a CNN Coupled with a Geometric Prior

  title={MonoCap: Monocular Human Motion Capture using a CNN Coupled with a Geometric Prior},
  author={Xiaowei Zhou and Menglong Zhu and Georgios Pavlakos and Spyridon Leonardos and Konstantinos G. Derpanis and Kostas Daniilidis},
  journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
Recovering 3D full-body human pose is a challenging problem with many applications. [] Key Method We introduce a novel approach that treats 2D joint locations as latent variables whose uncertainty distributions are given by a deep fully convolutional neural network.

Real‐time 3D human pose and motion reconstruction from monocular RGB videos

This work presents a method that captures and reconstructs the 3D skeletal pose and motion articulation of multiple characters using a monocular RGB camera, taking advantage of the recent development in deep learning that allows two‐dimensional (2D) pose estimation of several characters and the increasing availability of motion capture data.

On the role of depth predictions for 3D human pose estimation

This work builds a system that takes 2d joint locations as input along with their estimated depth value and predicts their 3d positions in camera coordinates and explains how the state-of-the-art results on the H3.6M validation set are due to the additional input of depth.

SDM3d: shape decomposition of multiple geometric priors for 3D pose estimation

SDM3d makes a new attempt by separating a 3D pose into the global structure and body deformations that are encoded explicitly via different priors constraints, and a joint learning strategy is designed to learn two over-complete dictionaries from training data to capture more geometric priors information.

Ordinal Depth Supervision for 3D Human Pose Estimation

This work proposes to use a weaker supervision signal provided by the ordinal depths of human joints, which achieves new state-of-the-art performance for the relevant benchmarks and validate the effectiveness of ordinal depth supervision for 3D human pose.

Deep Monocular 3D Human Pose Estimation via Cascaded Dimension-Lifting

This work decomposes the task of lifting pose from 2D image space to 3D spatial space into several sequential sub-tasks, 1) kinematic skeletons & individual joints estimation in 2D space, 2) rootrelative depth estimation, and 3) lifting to the 3D space which employs direct supervisions and contextual image features to guide the learning process.

Synthetic Training for Monocular Human Mesh Recovery

A depth-to-scale (D2S) projection to incorporate the depth difference into the projection function to derive per-joint scale variants for more proper supervision of 3D human mesh recovery from monocular images.

Weakly-Supervised Discovery of Geometry-Aware Representation for 3D Human Pose Estimation

A geometry-aware 3D representation for the human pose is proposed to address this limitation by using multiple views in a simple auto-encoder model at the training stage and only 2D keypoint information as supervision, and injecting the representation as a robust 3D prior.

Can 3D Pose be Learned from 2D Projections Alone?

This work proposes a novel Random Projection layer, which randomly projects the generated 3D skeleton and sends the resulting 2D pose to the discriminator, utilizing an adversarial framework to impose a prior on the 3D structure, learned solely from their random 2D projections.

MotioNet: 3D Human Motion Reconstruction from Monocular Video with Skeleton Consistency

MotioNet, a deep neural network that directly reconstructs the motion of a 3D human skeleton from monocular video, is introduced, the first data-driven approach that directly outputs a kinematic skeleton, which is a complete, commonly used, motion representation.

Sparseness Meets Deepness: 3D Human Pose Estimation from Monocular Video

This paper addresses the challenge of 3D full-body human pose estimation from a monocular image sequence with a novel approach that integrates a sparsity-driven 3D geometric prior and temporal smoothness and outperforms a publicly available 2D pose estimation baseline on the challenging PennAction dataset.

Marker-Less 3D Human Motion Capture with Monocular Image Sequence and Height-Maps

This work introduces the additional built-in knowledge, namely height-map, into the algorithmic scheme of reconstructing the 3D pose/motion under a single-view calibrated camera, and forms a new objective function to estimate 3D motion from the detected 2D joints in the monocular image sequence.

Unconstrained Monocular 3D Human Pose Estimation by Action Detection and Cross-Modality Regression Forest

A framework which applies action detection and 2D pose estimation techniques to infer 3D poses in an unconstrained video, which demonstrated promising results, significantly outperforming the relevant state-of-the-arts.

Keep It SMPL: Automatic Estimation of 3D Human Pose and Shape from a Single Image

The first method to automatically estimate the 3D pose of the human body as well as its 3D shape from a single unconstrained image is described, showing superior pose accuracy with respect to the state of the art.

3D Human Pose Estimation from Monocular Images with Deep Convolutional Neural Network

A deep convolutional neural network for 3D human pose estimation from monocular images is proposed and empirically show that the network has disentangled the dependencies among different body parts, and learned their correlations.

MoCap-guided Data Augmentation for 3D Pose Estimation in the Wild

This paper introduces an image-based synthesis engine that artificially augments a dataset of real images with 2D human pose annotations using 3D Motion Capture (MoCap) data to generate a large set of photorealistic synthetic images of humans with 3D pose annotations.

Synthesizing Training Images for Boosting Human 3D Pose Estimation

It is shown that pose space coverage and texture diversity are the key ingredients for the effectiveness of synthetic training data and CNNs trained with the authors' synthetic images out-perform those trained with real photos on 3D pose estimation tasks.

Flowing ConvNets for Human Pose Estimation in Videos

This work proposes a ConvNet architecture that is able to benefit from temporal context by combining information across the multiple frames using optical flow and outperforms a number of others, including one that uses optical flow solely at the input layers, one that regresses joint coordinates directly, and one that predicts heatmaps without spatial fusion.

3D Human Pose Estimation Using Convolutional Neural Networks with 2D Pose Information

This paper tackles the 3D human pose estimation task with end-to-end learning using CNNs and finds that more accurate 3D poses are obtained by combining information on relative positions with respect to multiple joints, instead of just one root joint.

Pose-conditioned joint angle limits for 3D human pose reconstruction

A general parametrization of body pose is defined and a new, multi-stage, method to estimate 3D pose from 2D joint locations using an over-complete dictionary of poses is defined that shows good generalization while avoiding impossible poses.