Context Modeling in 3D Human Pose Estimation: A Unified Perspective

  title={Context Modeling in 3D Human Pose Estimation: A Unified Perspective},
  author={Xiaoxuan Ma and Jiajun Su and Chunyu Wang and Hai Ci and Yizhou Wang},
  journal={2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  • Xiaoxuan Ma, Jiajun Su, +2 authors Yizhou Wang
  • Published 29 March 2021
  • Computer Science
  • 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Estimating 3D human pose from a single image suffers from severe ambiguity since multiple 3D joint configurations may have the same 2D projection. The state-of-the-art methods often rely on context modeling methods such as pictorial structure model (PSM) or graph neural network (GNN) to reduce ambiguity. However, there is no study that rigorously compares them side by side. So we first present a general formula for context modeling in which both PSM and GNN are its special cases. By comparing… Expand

Figures and Tables from this paper

MHFormer: Multi-Hypothesis Transformer for 3D Human Pose Estimation
  • Wenhao Li, Hong Liu, Hao Tang, Pichao Wang, L. Gool
  • Computer Science
  • ArXiv
  • 2021
A Multi-Hypothesis Transformer (MHFormer) that learns spatio-temporal representations of multiple plausible pose hypotheses and achieves state-of-the-art results on two challenging datasets: Human3.6M and MPI-INF-3DHP. Expand
VoxelTrack: Multi-Person 3D Human Pose Estimation and Tracking in the Wild
VoxelTrack employs a multi-branch network to jointly estimate 3D poses and re-identification (Re-ID) features for all people in the environment and outperforms the state-of-the-art methods on three public datasets including Shelf, Campus and CMU Panoptic. Expand
Context-LGM: Leveraging Object-Context Relation for Context-Aware Object Recognition
A novel Contextual Latent Generative Model (Context-LGM), which considers the object-context relation and models it in a hierarchical manner, and introduces a latent generative model with a pair of correlated latent variables to respectively model the object and context, and embed their correlation via the generative process. Expand
Smoothing Skeleton Avatar Visualizations Using Signal Processing Technology
Evaluating different filters for smoothing the movement visualizations but keeping their validity for a visual physio-therapeutic assessment presents a framework for the quantitative evaluation of smoothness and validity and recommends a suitable filter for stick figure visualizations in a mobile application. Expand
An Empirical Study of the Collapsing Problem in Semi-Supervised 2D Human Pose Estimation
  • Rongchang Xie, Chunyu Wang, Wenjun Zeng, Yizhou Wang
  • Computer Science
  • 2020
This work presents a surprisingly simple approach to drive the model to learn in the correct direction and applies it to recent pose estimators and finds that they achieve significantly better performances than their supervised counterparts on three public datasets. Expand


A Simple Yet Effective Baseline for 3d Human Pose Estimation
The results indicate that a large portion of the error of modern deep 3d pose estimation systems stems from their visual analysis, and suggests directions to further advance the state of the art in 3d human pose estimation. Expand
Keep It SMPL: Automatic Estimation of 3D Human Pose and Shape from a Single Image
The first method to automatically estimate the 3D pose of the human body as well as its 3D shape from a single unconstrained image is described, showing superior pose accuracy with respect to the state of the art. Expand
Multi-view Pictorial Structures for 3D Human Pose Estimation
This paper proposes a multi-view pictorial structures model that builds on recent advances in 2D pose estimation and incorporates evidence across multiple viewpoints to allow for robust 3D poses estimation. Expand
Towards 3D Human Pose Estimation in the Wild: A Weakly-Supervised Approach
A weakly-supervised transfer learning method that uses mixed 2D and 3D labels in a unified deep neutral network that presents two-stage cascaded structure to regularize the 3D pose prediction, which is effective in the absence of ground truth depth labels. Expand
Deep Kinematics Analysis for Monocular 3D Human Pose Estimation
It is shown that optimizing the kinematics structure of noisy 2D inputs is critical to obtain accurate 3D estimations and targeted ablation study shows that each former step is critical for the latter one to obtain promising results. Expand
3D Pictorial Structures for Multiple Human Pose Estimation
A novel 3D pictorial structures (3DPS) model is introduced that infers 3D human body configurations from the authors' reduced state space and is generic and applicable to both single and multiple human pose estimation. Expand
Cross View Fusion for 3D Human Pose Estimation
This work introduces a cross-view fusion scheme into CNN to jointly estimate 2D poses for multiple views and presents a recursive Pictorial Structure Model to recover the 3D pose from the multi-view 2D pose. Expand
Exploiting Spatial-Temporal Relationships for 3D Pose Estimation via Graph Convolutional Networks
A novel graph-based method to tackle the problem of 3D human body and 3D hand pose estimation from a short sequence of 2D joint detections, where domain knowledge about the human hand (body) configurations is explicitly incorporated into the graph convolutional operations to meet the specific demand of the 3D pose estimation. Expand
Ordinal Depth Supervision for 3D Human Pose Estimation
This work proposes to use a weaker supervision signal provided by the ordinal depths of human joints, which achieves new state-of-the-art performance for the relevant benchmarks and validate the effectiveness of ordinal depth supervision for 3D human pose. Expand
Optimizing Network Structure for 3D Human Pose Estimation
This work proposes a generic formulation where both GCN and Fully Connected Network (FCN) are its special cases, and introduces Locally Connected network (LCN) which is naturally implemented by this generic formulation and notably improves the representation capability over GCN. Expand