View Adaptive Recurrent Neural Networks for High Performance Human Action Recognition from Skeleton Data

@article{Zhang2017ViewAR,
  title={View Adaptive Recurrent Neural Networks for High Performance Human Action Recognition from Skeleton Data},
  author={Pengfei Zhang and Cuiling Lan and Junliang Xing and Wenjun Zeng and Jianru Xue and Nanning Zheng},
  journal={2017 IEEE International Conference on Computer Vision (ICCV)},
  year={2017},
  pages={2136-2145}
}
Skeleton-based human action recognition has recently attracted increasing attention due to the popularity of 3D skeleton data. One main challenge lies in the large view variations in captured human actions. We propose a novel view adaptation scheme to automatically regulate observation viewpoints during the occurrence of an action. Rather than re-positioning the skeletons based on a human defined prior criterion, we design a view adaptive recurrent neural network (RNN) with LSTM architecture… 

Figures and Tables from this paper

View Adaptive Neural Networks for High Performance Skeleton-Based Human Action Recognition
TLDR
A novel view adaptation scheme, which automatically determines the virtual observation viewpoints over the course of an action in a learning based data driven manner, and a two-stream scheme (referred to as VA-fusion) that fuses the scores of the two networks to provide the final prediction, obtaining enhanced performance.
Spatio-Temporal and View Attention Deep Network for Skeleton based View-invariant Human Action Recognition
TLDR
Experimental results demonstrate the effectiveness of the proposed model on the current largest NTU action recognition dataset and proposes a regularized cross-entropy loss to ensure the effective endto-end training of the network.
SRNet: Structured Relevance Feature Learning Network From Skeleton Data for Human Action Recognition
TLDR
This paper proposes a novel data reorganizing strategy to represent the global and local structure information of human skeleton joints and proposes an end-to-end multi-dimensional CNN network to fully consider the spatial and temporal information to learn the feature extraction transform function.
Relational Network for Skeleton-Based Action Recognition
TLDR
An Attentional Recurrent Relational Network-L STM (ARRN-LSTM) is proposed to simultaneously model spatial configurations and temporal dynamics in skeletons for action recognition and achieves better results than most mainstream methods.
Hypergraph Neural Network for Skeleton-Based Action Recognition
TLDR
This work proposes a hypergraph neural network (Hyper-GNN) to capture both spatial-temporal information and high-order dependencies for skeleton-based action recognition and demonstrates that the proposed method can achieve the best performance when compared with the state-of-the-art skeleton- based methods.
Skeleton-Based Relational Modeling for Action Recognition
TLDR
An Attentional Recurrent Relational Network-LSTM (ARRN-L STM) to simultaneously model spatial configurations and temporal dynamics in skeletons for action recognition and introduces an adaptive attentional module for focusing on potential discriminative parts of the skeleton towards a certain action.
Self-Attention Network for Skeleton-based Human Action Recognition
TLDR
Three variants of Self-Attention Network (SAN) are proposed, namely, SAN-V1,SAN-V2 and SAN- V3, which has the impressive capability of extracting high-level semantics by capturing long-range correlations and integrated the Temporal Segment Network (TSN) with the SAN variants which resulted in improved overall performance.
Graph Convolutional LSTM Model for Skeleton-Based Action Recognition
TLDR
A Graph Convolutional Long Short Term Memory Networks (GC-LSTM) model is proposed, which automatically learns spatiotemporal features to model the action, including graph convolution at each time step for input-to-state and state- to-state transition.
Deep Stacked Bidirectional LSTM Neural Network for Skeleton-Based Action Recognition
TLDR
This work proposes a novel Deep Stacked Bidirectional LSTM Network (DSB-LSTM) for human action recognition from skeleton data that outperforms the compared methods on all datasets, demonstrating the effectiveness of the DSB-L STM.
Semantics-Guided Neural Networks for Efficient Skeleton-Based Human Action Recognition
TLDR
A simple yet effective semantics-guided neural network (SGN) for skeleton-based action recognition that achieves the state-of-the-art performance on the NTU60, NTU120, and SYSU datasets.
...
...

References

SHOWING 1-10 OF 58 REFERENCES
Co-Occurrence Feature Learning for Skeleton Based Action Recognition Using Regularized Deep LSTM Networks
TLDR
This work takes the skeleton as the input at each time slot and introduces a novel regularization scheme to learn the co-occurrence features of skeleton joints, and proposes a new dropout algorithm which simultaneously operates on the gates, cells, and output responses of the LSTM neurons.
An End-to-End Spatio-Temporal Attention Model for Human Action Recognition from Skeleton Data
TLDR
This work proposes an end-to-end spatial and temporal attention model for human action recognition from skeleton data on top of the Recurrent Neural Networks with Long Short-Term Memory (LSTM), which learns to selectively focus on discriminative joints of skeleton within each frame of the inputs and pays different levels of attention to the outputs of different frames.
Hierarchical recurrent neural network for skeleton based action recognition
  • Yong Du, Wei Wang, Liang Wang
  • Computer Science
    2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
  • 2015
TLDR
This paper proposes an end-to-end hierarchical RNN for skeleton based action recognition, and demonstrates that the model achieves the state-of-the-art performance with high computational efficiency.
Spatio-Temporal LSTM with Trust Gates for 3D Human Action Recognition
TLDR
This paper introduces new gating mechanism within LSTM to learn the reliability of the sequential input data and accordingly adjust its effect on updating the long-term context information stored in the memory cell, and proposes a more powerful tree-structure based traversal method.
Informative joints based human action recognition using skeleton contexts
Learning Actionlet Ensemble for 3D Human Action Recognition
TLDR
This paper proposes to characterize the human actions with a novel actionlet ensemble model, which represents the interaction of a subset of human joints, which is robust to noise, invariant to translational and temporal misalignment, and capable of characterizing both the human motion and the human-object interactions.
Skeletal Quads: Human Action Recognition Using Joint Quadruples
  • Georgios Evangelidis, Gurkirt Singh, R. Horaud
  • Computer Science
    2014 22nd International Conference on Pattern Recognition
  • 2014
TLDR
A local skeleton descriptor that encodes the relative position of joint quadruples is proposed that outperforms state-of-the-art algorithms that rely only on joints, while it competes with methods that combine joints with extra cues.
NTU RGB+D: A Large Scale Dataset for 3D Human Activity Analysis
TLDR
A large-scale dataset for RGB+D human action recognition with more than 56 thousand video samples and 4 million frames, collected from 40 distinct subjects is introduced and a new recurrent neural network structure is proposed to model the long-term temporal correlation of the features for each body part, and utilize them for better action classification.
View-invariant human action recognition via robust locally adaptive multi-view learning
TLDR
This paper presents a robust locally adaptive multi-view learning algorithm based on learning multiple local L1-graphs to recognize human actions from different views and obtains about 6% improvement in recognition accuracy on the three datasets.
View-Invariant Action Recognition Based on Artificial Neural Networks
TLDR
The proposed view invariant action recognition method is the first one that has been tested in challenging experimental setups, a fact that denotes its effectiveness to deal with most of the open issues in action recognition.
...
...