Learning Dynamical Human-Joint Affinity for 3D Pose Estimation in Videos

  title={Learning Dynamical Human-Joint Affinity for 3D Pose Estimation in Videos},
  author={Junhao Zhang and Yali Wang and Zhipeng Zhou and Tianyu Luan and Zhe Wang and Y. Qiao},
  journal={IEEE Transactions on Image Processing},
Graph Convolution Network (GCN) has been successfully used for 3D human pose estimation in videos. However, it is often built on the fixed human-joint affinity, according to human skeleton. This may reduce adaptation capacity of GCN to tackle complex spatio-temporal pose variations in videos. To alleviate this problem, we propose a novel Dynamical Graph Network (DG-Net), which can dynamically identify human-joint affinity, and estimate 3D pose by adaptively learning spatial/temporal joint… 
Pose-guided Generative Adversarial Net for Novel View Action Synthesis
A novel framework named Pose-guided Action Separable Generative Adversarial Net (PAS-GAN), which utilizes pose to alleviate the difficulty of this task and conducts extensive experiments on two large-scale multi-view human action datasets, NTU-RGBD and PKU-MMD, demonstrating the effectiveness of PAS-gan which outperforms existing approaches.
The Best of Both Worlds: Combining Model-based and Nonparametric Approaches for 3D Human Body Estimation
This framework leverages the best of non-parametric and model-based methods and is also robust to partial occlusion and outperforms existing 3D human estimation methods on multiple public benchmarks.
Crafting Better Contrastive Views for Siamese Representation Learning
The proposed ContrastiveCrop takes a careful consideration of positive pairs for contrastive learning with negligible extra training overhead and empirically finds that views with similar ap-pearances are trivial for the Siamese model training.
CAFE: Learning to Condense Dataset by Aligning Features
This paper proposes a novel scheme to Condense dataset by Aligning FEatures (CAFE), which explicitly at-tempts to preserve the real-feature distribution as well as the discriminant power of the resulting synthetic set, lending itself to strong generalization capability to various architectures.


GAST-Net: Graph Attention Spatio-temporal Convolutional Networks for 3D Human Pose Estimation in Video
This work improves the learning of kinematic constraints in the human skeleton; namely posture, 2nd order joint relations, and symmetry by modeling both local and global spatial information via attention mechanisms and designing the interleaving of spatial information with temporal information to achieve a synergistic effect.
Exploiting Spatial-Temporal Relationships for 3D Pose Estimation via Graph Convolutional Networks
A novel graph-based method to tackle the problem of 3D human body and 3D hand pose estimation from a short sequence of 2D joint detections, where domain knowledge about the human hand (body) configurations is explicitly incorporated into the graph convolutional operations to meet the specific demand of the 3D pose estimation.
Trajectory Space Factorization for Deep Video-Based 3D Human Pose Estimation
A deep learning-based framework that utilizes matrix factorization for sequential 3d human poses estimation and demonstrates the effectiveness of the framework on long sequences by achieving state-of-the-art performances on multiple benchmark datasets.
Optimizing Network Structure for 3D Human Pose Estimation
This work proposes a generic formulation where both GCN and Fully Connected Network (FCN) are its special cases, and introduces Locally Connected network (LCN) which is naturally implemented by this generic formulation and notably improves the representation capability over GCN.
Attention Mechanism Exploits Temporal Contexts: Real-Time 3D Human Pose Reconstruction
An attentional mechanism to adaptively identify significant frames and tensor outputs from each deep neural net layer, leading to a more optimal estimation of 3D human pose estimation from a monocular video is designed.
Exploiting Temporal Information for 3D Human Pose Estimation
A sequence-to-sequence network composed of layer-normalized LSTM units with shortcut connections connecting the input to the output on the decoder side and imposed temporal smoothness constraint during training is designed, which helps the network to recover temporally consistent 3D poses over a sequence of images even when the 2D pose detector fails.
Semantic Graph Convolutional Networks for 3D Human Pose Regression
The proposed Semantic Graph Convolutional Networks (SemGCN), a novel neural network architecture that operates on regression tasks with graph-structured data that learns to capture semantic information such as local and global node relationships, which is not explicitly represented in the graph.
Learning Pose Grammar to Encode Human Body Configuration for 3D Pose Estimation
This paper proposes a pose grammar to tackle the problem of 3D human pose estimation, which takes 2D pose as input and learns a generalized 2D-3D mapping function and enforces high-level constraints over human poses.
Propagating LSTM: 3D Pose Estimation Based on Joint Interdependency
It is demonstrated that the JI drastically reduces the structural errors at body edges, thereby leads to a significant improvement in the accuracy of this novel 3D pose estimation method.
3D Human Pose Estimation in Video With Temporal Convolutions and Semi-Supervised Training
In this work, we demonstrate that 3D poses in video can be effectively estimated with a fully convolutional model based on dilated temporal convolutions over 2D keypoints. We also introduce