Learning Monocular 3D Human Pose Estimation from Multi-view Images

@article{Rhodin2018LearningM3,
  title={Learning Monocular 3D Human Pose Estimation from Multi-view Images},
  author={Helge Rhodin and J{\"o}rg Sp{\"o}rri and Isinsu Katircioglu and Victor Constantin and Fr{\'e}d{\'e}ric Meyer and Erich M{\"u}ller and Mathieu Salzmann and Pascal V. Fua},
  journal={2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year={2018},
  pages={8437-8446}
}
Accurate 3D human pose estimation from single images is possible with sophisticated deep-net architectures that have been trained on very large datasets. However, this still leaves open the problem of capturing motions for which no such database exists. Manual annotation is tedious, slow, and error-prone. In this paper, we propose to replace most of the annotations by the use of multiple views, at training time only. Specifically, we train the system to predict the same pose in all views. Such… 
CanonPose: Self-Supervised Monocular 3D Human Pose Estimation in the Wild
TLDR
A self-supervised approach that learns a single image 3D pose estimator from unlabeled multi-view data that does not require calibrated cameras and can therefore learn from moving cameras is proposed.
On Boosting Single-Frame 3D Human Pose Estimation via Monocular Videos
TLDR
This paper proposes to exploit monocular videos to complement the training dataset for the single-image 3D human pose estimation tasks, and successfully makes the model to learn new poses from unlabelledmonocular videos, promoting the accuracies of the baseline model by about 10%.
Self-Supervised 3D Human Pose Estimation via Part Guided Novel Image Synthesis
TLDR
This work proposes a self-supervised learning framework to disentangle variations from unlabeled video frames, and demonstrates state-of-the-art weakly- supervised 3D pose estimation performance on both Human3.6M and MPI-INF-3DHP datasets.
Unsupervised 3D Human Pose Estimation in Multi-view-multi-pose Video
TLDR
To improve the performance of 3D human pose estimation in videos, a new GAN network is proposed that enforces body consistency over frames in a video that does not require any ground truth 3D data.
Weakly-Supervised 3D Human Pose Learning via Multi-View Images in the Wild
TLDR
A novel end-to-end learning framework that enables weakly-supervised training using multi-view consistency and proposes a novel objective function that can only be minimized when the predictions of the trained model are consistent and plausible across all camera views.
3D Human Pose Estimation under limited supervision using Metric Learning
TLDR
This work proposes a metric learning based approach to jointly learn a rich embedding and 3D pose regression from the embedding using multi-view synchronised videos of human motions and very limited3D pose annotations to improve the performance when 3D supervision is limited.
MetaPose: Fast 3D Pose from Multiple Views without 3D Supervision
TLDR
This paper shows how to train a neural model that can perform accurate 3D pose and camera estimation, takes into account joint location uncertainty due occlusion from multiple views, and requires only 2D keypoint data for training.
Generalizing Monocular 3D Human Pose Estimation in the Wild
TLDR
This paper proposes a principled approach to generate high quality 3D pose ground truth given any in-the-wild image with a person inside, and builds a large-scale dataset, which enables the training of a high quality neural network model, without specialized training scheme and auxiliary loss function, which performs favorably against the state-of theart3D pose estimation methods.
On Triangulation as a Form of Self-Supervision for 3D Human Pose Estimation
TLDR
This work trains a 2D pose estimator in such a way that its predictions correspond to the re-projection of the triangulated 3D one and trains an auxiliary network on them to produce the final 3D poses, and complements theTriangulation with a weighting mechanism that nullify the impact of noisy predictions caused by self-occlusion or occlusion from other subjects.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 37 REFERENCES
Monocular 3D Human Pose Estimation in the Wild Using Improved CNN Supervision
We propose a CNN-based approach for 3D human body pose estimation from single RGB images that addresses the issue of limited generalizability of models trained solely on the starkly limited publicly
Harvesting Multiple Views for Marker-Less 3D Human Pose Annotations
TLDR
A geometry-driven approach to automatically collect annotations for human pose prediction tasks and achieves state-of-the-art results on standard benchmarks, demonstrating the effectiveness of the method in exploiting the available multi-view information.
Synthesizing Training Images for Boosting Human 3D Pose Estimation
TLDR
It is shown that pose space coverage and texture diversity are the key ingredients for the effectiveness of synthetic training data and CNNs trained with the authors' synthetic images out-perform those trained with real photos on 3D pose estimation tasks.
Weakly-supervised Transfer for 3D Human Pose Estimation in the Wild
TLDR
A weakly-supervised transfer learning method that learns an end-to-end network using training data with mixed 2D and 3D labels that produces high quality 3D human poses in the wild, without supervision of in-the-wild 3D data.
A Simple Yet Effective Baseline for 3d Human Pose Estimation
TLDR
The results indicate that a large portion of the error of modern deep 3d pose estimation systems stems from their visual analysis, and suggests directions to further advance the state of the art in 3d human pose estimation.
MoCap-guided Data Augmentation for 3D Pose Estimation in the Wild
TLDR
This paper introduces an image-based synthesis engine that artificially augments a dataset of real images with 2D human pose annotations using 3D Motion Capture (MoCap) data to generate a large set of photorealistic synthetic images of humans with 3D pose annotations.
Learning from Synthetic Humans
TLDR
This work presents SURREAL (Synthetic hUmans foR REAL tasks): a new large-scale dataset with synthetically-generated but realistic images of people rendered from 3D sequences of human motion capture data and shows that CNNs trained on this synthetic dataset allow for accurate human depth estimation and human part segmentation in real RGB images.
LCR-Net: Localization-Classification-Regression for Human Pose
TLDR
This work proposes an end-to-end architecture for joint 2D and 3D human pose estimation in natural images that significantly outperforms the state of the art in 3D pose estimation on Human3.6M, a controlled environment.
Unite the People: Closing the Loop Between 3D and 2D Human Representations
TLDR
This work proposes a hybrid approach to 3D body model fits for multiple human pose datasets with an extended version of the recently introduced SMPLify method, and shows that UP-3D can be enhanced with these improved fits to grow in quantity and quality, which makes the system deployable on large scale.
3D Human Pose Estimation from Monocular Images with Deep Convolutional Neural Network
TLDR
A deep convolutional neural network for 3D human pose estimation from monocular images is proposed and empirically show that the network has disentangled the dependencies among different body parts, and learned their correlations.
...
1
2
3
4
...