Human Pose Estimation in Space and Time Using 3D CNN

@inproceedings{Grinciunaite2016HumanPE,
  title={Human Pose Estimation in Space and Time Using 3D CNN},
  author={Agne Grinciunaite and Amogh Gudi and H. Emrah Tasli and Marten den Uyl},
  booktitle={ECCV Workshops},
  year={2016}
}
This paper explores the capabilities of convolutional neural networks to deal with a task that is easily manageable for humans: perceiving 3D pose of a human body from varying angles. However, in our approach, we are restricted to using a monocular vision system. For this purpose, we apply a convolutional neural network approach on RGB videos and extend it to three dimensional convolutions. This is done via encoding the time dimension in videos as the 3\(^\mathrm{rd}\) dimension in… 
MonoCap: Monocular Human Motion Capture using a CNN Coupled with a Geometric Prior
TLDR
This paper addresses the more challenging case of not only using a single camera but also not leveraging markers: going directly from 2D appearance to 3D geometry, using a novel approach that treats 2D joint locations as latent variables whose uncertainty distributions are given by a deep fully convolutional neural network.
3D Human Pose Estimation with Relational Networks
TLDR
The proposed network achieves state-of-the-art performance for 3D pose estimation in Human 3.6M dataset, and it effectively produces plausible results even in the existence of missing joints.
Skeleton-Based Human Action Recognition Using Spatial Temporal 3D Convolutional Neural Networks
TLDR
This paper proposes a novel two-stream model using 3D CNN in the field of skeleton-based action recognition, which outperforms most of RNN-based methods, which verify the complementary property between spatial and temporal information and the robustness to noise.
Two-Stream 3D Convolutional Neural Network for Skeleton-Based Action Recognition
TLDR
This paper proposes a novel two-stream model using 3D CNN in skeleton-based action recognition, which outperforms most of RNN-based methods, which verify the complementary property between spatial and temporal information and the robustness to noise.
3D human pose estimation with multi-scale graph convolution and hierarchical body pooling
TLDR
A light-weighted GCN for 3D pose lifting is developed by repeatedly stacking a residual block of multi-scale graph convolution and a hierarchical-body-pooling layer that can achieve state-of-the-art performance with much less model complexity.
Propagating LSTM: 3D Pose Estimation Based on Joint Interdependency
TLDR
It is demonstrated that the JI drastically reduces the structural errors at body edges, thereby leads to a significant improvement in the accuracy of this novel 3D pose estimation method.
Convolutional Memory Blocks for Depth Data Representation Learning
TLDR
A novel memory network module, called Convolutional Memory Block (CMB), is proposed, which empowers CNNs with the memory mechanism on handling depth data and is employed to develop a concise framework for predicting articulated pose from still depth images.
Human Action Classification Using Temporal Slicing for Deep Convolutional Neural Networks
TLDR
This paper attempts to provide an alternate methodology of feeding three-dimensional video data by preprocessing instead of directly inputting to a deep convolutional neural network, while achieving significantly higher accuracy compared to the aforementioned network architectures.
Estimation of Human Poses Categories and Physical Object Properties from Motion Trajectories
TLDR
A deep learning-based solution to classify the categorized body orientation in still images and an efficient framework based on the body orientation classification scheme to estimate human 3D pose in monocular RGB images are proposed.
Dynamic Kernel Distillation for Efficient Pose Estimation in Videos
TLDR
A novel Dynamic Kernel Distillation model is proposed to facilitate small networks for estimating human poses in videos, thus significantly lifting the efficiency and its state-of-the-art accuracy.
...
...

References

SHOWING 1-10 OF 21 REFERENCES
Efficient object localization using Convolutional Networks
TLDR
A novel architecture which includes an efficient `position refinement' model that is trained to estimate the joint offset location within a small region of the image to achieve improved accuracy in human joint location estimation is introduced.
Deep Convolutional Neural Networks for Efficient Pose Estimation in Gesture Videos
TLDR
This work is the first to their knowledge to use ConvNets for estimating human pose in videos and introduces a new network that exploits temporal information from multiple frames, leading to better performance.
3D Convolutional Neural Networks for Human Action Recognition
TLDR
A novel 3D CNN model for action recognition that extracts features from both the spatial and the temporal dimensions by performing 3D convolutions, thereby capturing the motion information encoded in multiple adjacent frames.
Sparseness Meets Deepness: 3D Human Pose Estimation from Monocular Video
TLDR
This paper addresses the challenge of 3D full-body human pose estimation from a monocular image sequence with a novel approach that integrates a sparsity-driven 3D geometric prior and temporal smoothness and outperforms a publicly available 2D pose estimation baseline on the challenging PennAction dataset.
Learning Spatiotemporal Features with 3D Convolutional Networks
TLDR
The learned features, namely C3D (Convolutional 3D), with a simple linear classifier outperform state-of-the-art methods on 4 different benchmarks and are comparable with current best methods on the other 2 benchmarks.
Hands Deep in Deep Learning for Hand Pose Estimation
TLDR
It is shown that a prior on the 3D pose can be easily introduced and significantly improves the accuracy and reliability of the predictions, and how to use context efficiently to deal with ambiguities between fingers is shown.
Full-Body Human Pose Estimation from Monocular Video Sequence via Multi-dimensional Boosting Regression
TLDR
This work proposes a scheme to estimate two-dimensional full-body human poses in a monocular video sequence using multi-dimensional boosting regression, and designs a joints relationship tree, corresponding to the full hierarchical structure of joints in a human body.
Combining local appearance and holistic view: Dual-Source Deep Neural Networks for human pose estimation
TLDR
The experimental results show the effectiveness of the proposed DS-CNN method by comparing to the state-of-the-art human-pose estimation methods based on pose priors that are estimated from physiologically inspired graphical models or learned from a holistic perspective.
Spatio-Temporal Matching for Human Pose Estimation in Video
TLDR
This paper formulates the problem of human detection in videos as spatio-temporal matching (STM) between a 3D motion capture model and trajectories in videos and is the first paper that solves the correspondence between video and 3Dmotion capture data for human pose detection.
DeepPose: Human Pose Estimation via Deep Neural Networks
TLDR
The pose estimation is formulated as a DNN-based regression problem towards body joints and a cascade of such DNN regres- sors which results in high precision pose estimates.
...
...