Towards 3D Human Pose Estimation in the Wild: A Weakly-Supervised Approach

  title={Towards 3D Human Pose Estimation in the Wild: A Weakly-Supervised Approach},
  author={Xingyi Zhou and Qixing Huang and Xiao Sun and X. Xue and Yichen Wei},
  journal={2017 IEEE International Conference on Computer Vision (ICCV)},
In this paper, we study the task of 3D human pose estimation in the wild. This task is challenging due to lack of training data, as existing datasets are either in the wild images with 2D pose or in the lab images with 3D pose.,, We propose a weakly-supervised transfer learning method that uses mixed 2D and 3D labels in a unified deep neutral network that presents two-stage cascaded structure. Our network augments a state-of-the-art 2D pose estimation sub-network with a 3D depth regression sub… 

Figures and Tables from this paper

Weakly-Supervised 3D Human Pose Learning via Multi-View Images in the Wild
A novel end-to-end learning framework that enables weakly-supervised training using multi-view consistency and proposes a novel objective function that can only be minimized when the predictions of the trained model are consistent and plausible across all camera views.
Lifting 2d Human Pose to 3d : A Weakly Supervised Approach
This paper proposes a method which can effectively predict 3d human pose from 2d pose using a deep neural network trained in a weakly-supervised manner on a combination of ground-truth 3d pose andGround-truth 2d posing, and demonstrates the superior generalization ability of this method by cross-dataset validation.
In the Wild Human Pose Estimation Using Explicit 2D Features and Intermediate 3D Representations
This work proposes a new deep learning based method for monocular 3D human pose estimation that shows high accuracy and generalizes better to in-the-wild scenes.
Towards Generalization of 3D Human Pose Estimation In The Wild
3DBodyTex.Pose showed promising improvement in the overall performance, and a sensible decrease in the per joint position error when testing on challenging viewpoints, and is expected to offer the research community with new possibilities for generalizing 3D pose estimation from monocular in-the-wild images.
Learning Temporal 3D Human Pose Estimation with Pseudo-Labels
A temporal convolutional neural network is trained with the generated 3D ground-truth and the geometric multi-view consistency loss, imposing geometrical constraints on the predicted 3D body skeleton.
Weakly-supervised 3D Human Pose Estimation with Cross-view U-shaped Graph Convolutional Network
A simple yet effective pipeline for weakly-supervised cross-view 3D human pose estimation by only using two camera views, requiring no 3D ground truth but only 2D annotations, which outperforms existing state-of-the-art methods remarkably.
3D Human Pose Estimation in the Wild by Adversarial Learning
An adversarial learning framework is proposed, which distills the 3D human pose structures learned from the fully annotated dataset to in-the-wild images with only 2D pose annotations and designs a geometric descriptor, which computes the pairwise relative locations and distances between body joints, as a new information source for the discriminator.
Geometry-Driven Self-Supervised Method for 3D Human Pose Estimation
The transform re-projection loss is designed that is an effective way to explore multi-view consistency for training the 2D-to-3D lifting network and the confidences of 2D joints are adopted to integrate losses from different views to alleviate the influence of noises caused by the self-occlusion problem.
Learning 3D Human Pose from Structure and Motion
This work proposes two anatomically inspired loss functions and uses them with a weakly-supervised learning framework to jointly learn from large-scale in-the-wild 2D and indoor/synthetic 3D data and presents a simple temporal network that exploits temporal and structural cues present in predicted pose sequences to temporally harmonize the pose estimations.
3-D Human Pose Estimation Using Iterative Conditional Squeeze and Excitation Networks
A new method for single-camera real-world 3-D human pose estimation using multitask training together with iterative pose refinement using a novel conditional attention mechanism that is efficient enough to run on commodity hardware, producing pose estimates in real time.


MoCap-guided Data Augmentation for 3D Pose Estimation in the Wild
This paper introduces an image-based synthesis engine that artificially augments a dataset of real images with 2D human pose annotations using 3D Motion Capture (MoCap) data to generate a large set of photorealistic synthetic images of humans with 3D pose annotations.
Monocular 3D Human Pose Estimation in the Wild Using Improved CNN Supervision
We propose a CNN-based approach for 3D human body pose estimation from single RGB images that addresses the issue of limited generalizability of models trained solely on the starkly limited publicly
Sparseness Meets Deepness: 3D Human Pose Estimation from Monocular Video
This paper addresses the challenge of 3D full-body human pose estimation from a monocular image sequence with a novel approach that integrates a sparsity-driven 3D geometric prior and temporal smoothness and outperforms a publicly available 2D pose estimation baseline on the challenging PennAction dataset.
Monocular 3D Human Pose Estimation Using Transfer Learning and Improved CNN Supervision
We propose a new CNN-based method for regressing 3D human body pose from a single image that improves over the state-of-the-art on standard benchmarks by more than 25%. Our approach addresses the
Synthesizing Training Images for Boosting Human 3D Pose Estimation
It is shown that pose space coverage and texture diversity are the key ingredients for the effectiveness of synthetic training data and CNNs trained with the authors' synthetic images out-perform those trained with real photos on 3D pose estimation tasks.
3D Human Pose Estimation from Monocular Images with Deep Convolutional Neural Network
A deep convolutional neural network for 3D human pose estimation from monocular images is proposed and empirically show that the network has disentangled the dependencies among different body parts, and learned their correlations.
Lifting from the Deep: Convolutional 3D Pose Estimation from a Single Image
An integrated approach is taken that fuses probabilistic knowledge of 3D human pose with a multi-stage CNN architecture and uses the knowledge of plausible 3D landmark locations to refine the search for better 2D locations.
Coarse-to-Fine Volumetric Prediction for Single-Image 3D Human Pose
This paper proposes a fine discretization of the 3D space around the subject and trains a ConvNet to predict per voxel likelihoods for each joint, which creates a natural representation for 3D pose and greatly improves performance over the direct regression of joint coordinates.
MonoCap: Monocular Human Motion Capture using a CNN Coupled with a Geometric Prior
This paper addresses the more challenging case of not only using a single camera but also not leveraging markers: going directly from 2D appearance to 3D geometry, using a novel approach that treats 2D joint locations as latent variables whose uncertainty distributions are given by a deep fully convolutional neural network.
2D Human Pose Estimation: New Benchmark and State of the Art Analysis
A novel benchmark "MPII Human Pose" is introduced that makes a significant advance in terms of diversity and difficulty, a contribution that is required for future developments in human body models.