• Corpus ID: 201314491

Trajectory Space Factorization for Deep Video-Based 3D Human Pose Estimation

@article{Lin2019TrajectorySF,
  title={Trajectory Space Factorization for Deep Video-Based 3D Human Pose Estimation},
  author={Jiahao Lin and Gim Hee Lee},
  journal={ArXiv},
  year={2019},
  volume={abs/1908.08289}
}
Existing deep learning approaches on 3d human pose estimation for videos are either based on Recurrent or Convolutional Neural Networks (RNNs or CNNs. [] Key Method More specifically, the 3d poses in all frames are represented as a motion matrix factorized into a trajectory bases matrix and a trajectory coefficient matrix.

Figures and Tables from this paper

Exploiting Temporal Contexts with Strided Transformer for 3D Human Pose Estimation

An improved Transformer-based architecture is proposed for 3D human pose estimation in videos to lift a sequence of 2D joint locations to a 3D pose, and achieves state-of-the-art results with much fewer parameters.

A Graph Attention Spatio-temporal Convolutional Network for 3D Human Pose Estimation in Video

This work improves the learning of kinematics constraints in the human skeleton: posture, local kinematic connections, and symmetry by modeling local and global spatial information via attention mechanisms by proposing a simple yet effective graph attention spatio-temporal convolutional network (GAST-Net).

IVT: An End-to-End Instance-guided Video Transformer for 3D Pose Estimation

This paper simplifies the paradigm into an end-to-end framework, Instance-guided Video Transformer (IVT), which enables learning spatiotemporal contextual depth information from visual features effectively and predicts 3D poses directly from video frames and proposes a cross-scale instance-guided attention mechanism to handle the variational scales among multiple persons.

Learning Dynamical Human-Joint Affinity for 3D Pose Estimation in Videos

DG-Net is proposed, which can dynamically identify human-joint affinity, and estimate 3D pose by adaptively learning spatial/temporal joint relations from videos, and outperforms a number of recent SOTA approaches with fewer input frames and model size.

U-shaped spatial–temporal transformer network for 3D human pose estimation

A novel U-shaped spatial–temporal transformer-based network (U-STN) for 3D human pose estimation that can transform features across different scales and extract meaningful semantic features at all levels is presented.

MixSTE: Seq2seq Mixed Spatio-Temporal Encoder for 3D Human Pose Estimation in Video

This work proposes MixSTE (Mixed Spatio-Temporal Encoder), which has a temporal transformer block to separately model the temporal motion of each joint and a spatial transformer block for inter-joint spatial correlation, and extends from the central frame to entire frames of the input video, thereby improving the coherence between the input and output sequences.

Learning Skeletal Graph Neural Networks for Hard 3D Pose Estimation

This work proposes a hop-aware hierarchical channel-squeezing fusion layer to effectively extract relevant information from neighboring nodes while suppressing undesired noises in GNN learning and proposes a temporal-aware dynamic graph construction procedure that is robust and effective for 3D pose estimation.

An Improved 3D Human Pose Estimation Model Based on Temporal Convolution with Gaussian Error Linear Units

  • Jian KangR. Liu D. Zhou
  • Computer Science
    2022 8th International Conference on Virtual Reality (ICVR)
  • 2022
The decompose the 3D joint location regression into bone direction and bone length, and propose a temporal convolutional network incorporating Gaussian error linear units (TCG) to solve bone direction which enables more inter-frame features to be captured, allowing the feature relationships between data to be fully utilized.

Multi-View Pose Generator Based on Deep Learning for Monocular 3D Human Pose Estimation

This paper proposes a novel end-to-end 3D pose estimation network for monocular 3D human pose estimation based on deep learning and proposes a simple but effective data augmentation method for generating multi-view 2D pose annotations.

A Review of 3D Human Pose Estimation from 2D Images

An overview of the classic and deep learning-based 3D pose estimation approaches is provided, point out relevant evaluation metrics, pose parametrizations, body models, and 3D human pose datasets.

References

SHOWING 1-10 OF 43 REFERENCES

Exploiting Temporal Information for 3D Human Pose Estimation

A sequence-to-sequence network composed of layer-normalized LSTM units with shortcut connections connecting the input to the output on the decoder side and imposed temporal smoothness constraint during training is designed, which helps the network to recover temporally consistent 3D poses over a sequence of images even when the 2D pose detector fails.

Recurrent 3D Pose Sequence Machines

A Recurrent 3D Pose Sequence Machine (RPSM) is presented to automatically learn the image-dependent structural constraint and sequence-dependent temporal context by using a multi-stage sequential refinement.

3D Human Pose Estimation in Video With Temporal Convolutions and Semi-Supervised Training

In this work, we demonstrate that 3D poses in video can be effectively estimated with a fully convolutional model based on dilated temporal convolutions over 2D keypoints. We also introduce

Monocular 3D Human Pose Estimation Using Transfer Learning and Improved CNN Supervision

We propose a new CNN-based method for regressing 3D human body pose from a single image that improves over the state-of-the-art on standard benchmarks by more than 25%. Our approach addresses the

3D Human Pose Estimation from Monocular Images with Deep Convolutional Neural Network

A deep convolutional neural network for 3D human pose estimation from monocular images is proposed and empirically show that the network has disentangled the dependencies among different body parts, and learned their correlations.

Structured Prediction of 3D Human Pose with Deep Neural Networks

This paper introduces a Deep Learning regression architecture for structured prediction of 3D human pose from monocular images that relies on an overcomplete autoencoder to learn a high-dimensional latent pose representation and account for joint dependencies.

A Simple Yet Effective Baseline for 3d Human Pose Estimation

The results indicate that a large portion of the error of modern deep 3d pose estimation systems stems from their visual analysis, and suggests directions to further advance the state of the art in 3d human pose estimation.

3D Human Pose Estimation from a Single Image via Distance Matrix Regression

  • F. Moreno-Noguer
  • Computer Science
    2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
  • 2017
It is shown that more precise pose estimates can be obtained by representing both the 2D and 3D human poses using NxN distance matrices, and formulating the problem as a 2D-to-3D distance matrix regression.

VNect: Real-time 3D Human Pose Estimation with a Single RGB Camera

This work presents the first real-time method to capture the full global 3D skeletal pose of a human in a stable, temporally consistent manner using a single RGB camera and shows that the approach is more broadly applicable than RGB-D solutions, i.e., it works for outdoor scenes, community videos, and low quality commodity RGB cameras.

Learning 3D Human Pose from Structure and Motion

This work proposes two anatomically inspired loss functions and uses them with a weakly-supervised learning framework to jointly learn from large-scale in-the-wild 2D and indoor/synthetic 3D data and presents a simple temporal network that exploits temporal and structural cues present in predicted pose sequences to temporally harmonize the pose estimations.