Single-Shot Multi-person 3D Pose Estimation from Monocular RGB

@article{Mehta2018SingleShotM3,
  title={Single-Shot Multi-person 3D Pose Estimation from Monocular RGB},
  author={Dushyant Mehta and Oleksandr Sotnychenko and Franziska Mueller and Weipeng Xu and Srinath Sridhar and Gerard Pons-Moll and Christian Theobalt},
  journal={2018 International Conference on 3D Vision (3DV)},
  year={2018},
  pages={120-130}
}
We propose a new single-shot method for multi-person 3D pose estimation in general scenes from a monocular RGB camera. Our approach uses novel occlusion-robust pose-maps (ORPM) which enable full body pose inference even under strong partial occlusions by other people and objects in the scene. ORPM outputs a fixed number of maps which encode the 3D joint locations of all people in the scene. Body part associations [8] allow us to infer 3D pose for an arbitrary number of people without explicit… Expand
SMAP: Single-Shot Multi-Person Absolute 3D Pose Estimation
TLDR
A novel system that first regresses a set of 2.5D representations of body parts and then reconstructs the 3D absolute poses based on these 2. Expand
Multi-Person 3D Human Pose Estimation from Monocular Images
TLDR
HG-RCNN, a Mask-RCnn based network that also leverages the benefits of the Hourglass architecture for multi-person 3D Human Pose Estimation, achieves the state-of-the-art results on MuPoTS-3D while also approximating the 3D pose in the camera-coordinate system. Expand
XNect: Real-time Multi-person 3D Human Pose Estimation with a Single RGB Camera
TLDR
A real-time approach for multi-person 3D motion capture at over 30 fps using a single RGB camera and a new architecture for this CNN, called SelecSLS Net, that uses novel selective long and short range skip connections to improve the information flow allowing for a drastically faster network without compromising accuracy. Expand
Camera Distance-Aware Top-Down Approach for 3D Multi-Person Pose Estimation From a Single RGB Image
TLDR
This work firstly proposes a fully learning-based, camera distance-aware top-down approach for 3D multi-person pose estimation from a single RGB image, which achieves comparable results with the state-of-the-art 3D single- person pose estimation models without any groundtruth information and significantly outperforms previous 3DMulti-Person pose estimation methods on publicly available datasets. Expand
Cross-View Tracking for Multi-Human 3D Pose Estimation at Over 100 FPS
TLDR
This paper proposes to retain the 3D pose for each person and update them iteratively via the cross-view multi-human tracking and achieves 154 FPS on 12 cameras and 34 FPS on 28 cameras, indicating its ability to handle large-scale real-world applications. Expand
Exploring Severe Occlusion: Multi-Person 3D Pose Estimation with Gated Convolution
TLDR
This paper proposes a temporal regression network with a gated convolution module to transform 2D joints to 3D and recover the missing occluded joints in the meantime and shows that the proposed method outperforms most state-of-the-art 2D-to-3D pose estimation methods, especially for the scenarios with heavy occlusions. Expand
Monocular 3D multi-person pose estimation via predicting factorized correction factors
  • Yu Guo, Lichen Ma, Zhi Li, Xuan Wang, Fei Wang
  • Computer Science
  • Computer Vision and Image Understanding
  • 2021
TLDR
A pipeline consists of human detection, absolute 3D human root localization, and root-relative 3D single-person pose estimation modules, and a data augmentation strategy is presented to tackle occlusions, such that the model can effectively estimate the root localization with the incomplete bounding boxes. Expand
Body Size and Depth Disambiguation in Multi-Person Reconstruction from Single Images
TLDR
This work devises a novel optimization scheme that learns the appropriate body scale and relative camera pose, by enforcing the feet of all people to remain on the ground floor, and is able to robustly estimate the body translation and shape of multiple people while retrieving their spatial arrangement. Expand
LCR-Net++: Multi-Person 2D and 3D Pose Detection in Natural Images
TLDR
The approach significantly outperforms the state of the art in 3D pose estimation on Human3.6M, a controlled environment, and shows promising results on real images for both single and multi- person subsets of the MPII 2D pose benchmark and demonstrates satisfying3D pose results even for multi-person images. Expand
HDNet: Human Depth Estimation for Multi-Person Camera-Space Localization
TLDR
The Human Depth Estimation Network (HDNet), an end-to-end framework for absolute root joint localization in the camera coordinate space, is proposed and shown to outperform the previous state-of-the-art consistently under multiple evaluation metrics. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 70 REFERENCES
Monocular 3D Pose and Shape Estimation of Multiple People in Natural Scenes: The Importance of Multiple Scene Constraints
TLDR
This paper leverage state-of-the-art deep multi-task neural networks and parametric human and scene modeling, towards a fully automatic monocular visual sensing system for multiple interacting people, which infers the 2d and 3d pose and shape of multiple people from a single image. Expand
Monocular 3D Human Pose Estimation in the Wild Using Improved CNN Supervision
We propose a CNN-based approach for 3D human body pose estimation from single RGB images that addresses the issue of limited generalizability of models trained solely on the starkly limited publiclyExpand
ArtTrack: Articulated Multi-Person Tracking in the Wild
TLDR
This paper uses a model that resembles existing architectures for single-frame pose estimation but is substantially faster to generate proposals for body joint locations and forms articulated tracking as spatio-temporal grouping of such proposals. Expand
DeepCut: Joint Subset Partition and Labeling for Multi Person Pose Estimation
TLDR
An approach that jointly solves the tasks of detection and pose estimation: it infers the number of persons in a scene, identifies occluded body parts, and disambiguates body parts between people in close proximity of each other is proposed. Expand
VNect: Real-time 3D Human Pose Estimation with a Single RGB Camera
TLDR
This work presents the first real-time method to capture the full global 3D skeletal pose of a human in a stable, temporally consistent manner using a single RGB camera and shows that the approach is more broadly applicable than RGB-D solutions, i.e., it works for outdoor scenes, community videos, and low quality commodity RGB cameras. Expand
Sparseness Meets Deepness: 3D Human Pose Estimation from Monocular Video
TLDR
This paper addresses the challenge of 3D full-body human pose estimation from a monocular image sequence with a novel approach that integrates a sparsity-driven 3D geometric prior and temporal smoothness and outperforms a publicly available 2D pose estimation baseline on the challenging PennAction dataset. Expand
Pose-conditioned joint angle limits for 3D human pose reconstruction
TLDR
A general parametrization of body pose is defined and a new, multi-stage, method to estimate 3D pose from 2D joint locations using an over-complete dictionary of poses is defined that shows good generalization while avoiding impossible poses. Expand
Towards Accurate Multi-person Pose Estimation in the Wild
TLDR
This work proposes a method for multi-person detection and 2-D pose estimation that achieves state-of-art results on the challenging COCO keypoints task by using a novel form of keypoint-based Non-Maximum-Suppression (NMS), instead of the cruder box-level NMS, and by introducing a novel aggregation procedure to obtain highly localized keypoint predictions. Expand
Multi-person Pose Estimation with Local Joint-to-Person Associations
TLDR
This work proposes a method that estimates the poses of multiple persons in an image in which a person can be occluded by another person or might be truncated, and considers multi-person pose estimation as a joint-to-person association problem. Expand
Monocular 3D Human Pose Estimation by Predicting Depth on Joints
TLDR
The empirical e-valuation on Human3.6M and HHOI dataset demonstrates the advantage of combining global 2D skeleton and local image patches for depth prediction, and the superior quantitative and qualitative performance relative to state-of-the-art methods. Expand
...
1
2
3
4
5
...