Improving Robustness and Accuracy via Relative Information Encoding in 3D Human Pose Estimation

  title={Improving Robustness and Accuracy via Relative Information Encoding in 3D Human Pose Estimation},
  author={Wenkang Shan and Haopeng Lu and Shanshe Wang and Xinfeng Zhang and Wen Gao},
  journal={Proceedings of the 29th ACM International Conference on Multimedia},
  • Wenkang ShanHaopeng Lu W. Gao
  • Published 29 July 2021
  • Computer Science
  • Proceedings of the 29th ACM International Conference on Multimedia
Most of the existing 3D human pose estimation approaches mainly focus on predicting 3D positional relationships between the root joint and other human joints (local motion) instead of the overall trajectory of the human body (global motion). Despite the great progress achieved by these approaches, they are not robust to global motion, and lack the ability to accurately predict local motion with a small movement range. To alleviate these two problems, we propose a relative information encoding… 

Absolute 3D Human Pose Estimation Using Noise-Aware Radial Distance Predictions

A simple yet effective pipeline for absolute three-dimensional (3D) human pose estimation from two-dimensional joint keypoints, namely, the 2D-to-3D human pose lifting problem, which adopts a Siamese architecture that enforces the consistency of features between two training inputs.

G2O-Pose: Real-Time Monocular 3D Human Pose Estimation Based on General Graph Optimization

This work regard the 3D human pose as a graph, and solve the problem by general graph optimization (G2O) under multiple constraints, which outperforms the previous non-deep learning methods in terms of running speed, with only a slight decrease in accuracy.

Local to Global Transformer for Video Based 3d Human Pose Estimation

This paper proposes a method that combines local human body parts and global skeleton joints using a temporal transformer to finely track the temporal motion of humanBody parts and obtains the target 3D human pose.

Top-Down System for Multi-Person 3D Absolute Pose Estimation from Monocular Videos

A real-time framework for multi-person 3D absolute pose estimation from a monocular camera, which integrates a human detector, a 2D pose estimator, a 3D root-relative pose reconstructor, and a root depth estimator in a top-down manner is proposed.

Ray3D: ray-based 3D human pose estimation for monocular absolute 3D localization

A novel monocular ray-based 3D (Ray3D) absolute human pose estimation with calibrated camera that significantly outperforms existing state-of-the-art models and converts the input from pixel space to 3D normalized rays to make the approach robust to camera intrinsic parameter changes.

Spatial-Temporal Parallel Transformer for Arm-Hand Dynamic Estimation

A Spatial-Temporal Parallel Arm-Hand Motion Transformer (PAHMT) to predict the arm and hand dynamics simultaneously and has advantages over previous state-of-the-art approaches and shows robustness under various challenging scenarios.

Uplift and Upsample: Efficient 3D Human Pose Estimation with Uplifting Transformers

It is shown how masked token modeling can be utilized for temporal upsampling within Transformer blocks and allows to de-couple the sampling rate of input 2D poses and the target frame rate of the video and drastically decreases the total computational complexity.

FLEX: Parameter-free Multi-view 3D Human Motion Reconstruction

This work introduces FLEX (Free muLti-view rEconstruXion), an end-to-end parameter-free multi-view model that outperforms state-of-the-art methods that are not parameter- free and shows that in the absence of camera parameters, it outperforms them by a large margin while obtaining comparable results when camera parameters are available.

FLEX: Extrinsic Parameters-free Multi-view 3D Human Motion Reconstruction

FLEX is an end-to-end extrinsic parameter-free multi-view model that reconstructs a single consistent skeleton with temporally coherent joint rotations that outperforms state-of-the-art methods that are not ep-free and shows that in the absence of camera parameters, it outperforms them by a large margin.

PedRecNet: Multi-task deep neural network for full 3D human pose and orientation estimation

We present a multitask network that supports various deep neural network based pedestrian detection functions. Besides 2D and 3D human pose, it also supports body and head orientation estimation



Deep Kinematics Analysis for Monocular 3D Human Pose Estimation

It is shown that optimizing the kinematics structure of noisy 2D inputs is critical to obtain accurate 3D estimations and targeted ablation study shows that each former step is critical for the latter one to obtain promising results.

Exploiting Temporal Information for 3D Human Pose Estimation

A sequence-to-sequence network composed of layer-normalized LSTM units with shortcut connections connecting the input to the output on the decoder side and imposed temporal smoothness constraint during training is designed, which helps the network to recover temporally consistent 3D poses over a sequence of images even when the 2D pose detector fails.

3D Human Pose Estimation from a Single Image via Distance Matrix Regression

  • F. Moreno-Noguer
  • Computer Science
    2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
  • 2017
It is shown that more precise pose estimates can be obtained by representing both the 2D and 3D human poses using NxN distance matrices, and formulating the problem as a 2D-to-3D distance matrix regression.

Learning 3D Human Pose from Structure and Motion

This work proposes two anatomically inspired loss functions and uses them with a weakly-supervised learning framework to jointly learn from large-scale in-the-wild 2D and indoor/synthetic 3D data and presents a simple temporal network that exploits temporal and structural cues present in predicted pose sequences to temporally harmonize the pose estimations.

3D Pictorial Structures for Multiple Human Pose Estimation

A novel 3D pictorial structures (3DPS) model is introduced that infers 3D human body configurations from the authors' reduced state space and is generic and applicable to both single and multiple human pose estimation.

A Joint Relationship Aware Neural Network for Single-Image 3D Human Pose Estimation

A joint relationship aware neural network is proposed to take both global and local joint relationship into consideration, which demonstrates the effectiveness of the proposed method on 3D human pose estimation benchmarks.

Not All Parts Are Created Equal: 3D Pose Estimation by Modeling Bi-Directional Dependencies of Body Parts

A progressive approach is proposed that explicitly accounts for the distinct DOFs among the body parts, and introduces a pose-attribution estimation, where the relative location of a limb joint with respect to the torso, which has the least DOF of a human body, is explicitly estimated and further fed to the joint-estimation module.

Learning Pose Grammar to Encode Human Body Configuration for 3D Pose Estimation

This paper proposes a pose grammar to tackle the problem of 3D human pose estimation, which takes 2D pose as input and learns a generalized 2D-3D mapping function and enforces high-level constraints over human poses.

A Simple Yet Effective Baseline for 3d Human Pose Estimation

The results indicate that a large portion of the error of modern deep 3d pose estimation systems stems from their visual analysis, and suggests directions to further advance the state of the art in 3d human pose estimation.

Exploiting Spatial-Temporal Relationships for 3D Pose Estimation via Graph Convolutional Networks

A novel graph-based method to tackle the problem of 3D human body and 3D hand pose estimation from a short sequence of 2D joint detections, where domain knowledge about the human hand (body) configurations is explicitly incorporated into the graph convolutional operations to meet the specific demand of the 3D pose estimation.