3D Convolutional Neural Networks for Human Action Recognition

@article{Ji20133DCN,
  title={3D Convolutional Neural Networks for Human Action Recognition},
  author={Shuiwang Ji and Wei Xu and Ming Yang and Kai Yu},
  journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
  year={2013},
  volume={35},
  pages={221-231}
}
  • Shuiwang Ji, W. Xu, +1 author Kai Yu
  • Published 21 June 2010
  • Computer Science, Medicine
  • IEEE Transactions on Pattern Analysis and Machine Intelligence
We consider the automated recognition of human actions in surveillance videos. [...] Key Method This model extracts features from both the spatial and the temporal dimensions by performing 3D convolutions, thereby capturing the motion information encoded in multiple adjacent frames. The developed model generates multiple channels of information from the input frames, and the final feature representation combines information from all channels.Expand
Human Action Recognition by Fusion of Convolutional Neural Networks and spatial-temporal Information
TLDR
This paper modify the existing network structure for action recognition, and then develop a different 3D CNN models in order to fuse the information of spatial and temporal dimensions. Expand
Using 3D convolutional neural network in surveillance videos for recognizing human actions
TLDR
The main aim is to develop a novel 3D Convolutional Neural Network model for human action recognition in realistic environment which automatically tends to recognize specific human actions which needs attention in the real world environment like in pathways or in corridors of any organization. Expand
A 2D Convolutional Neural Network Approach for Human Action Recognition
TLDR
A 2D-CNN approach that learns robust feature representation from temporal information embedded into the motion history images of action videos and is compared favorably against the handcrafted state-of-the-art methods. Expand
Human Action Recognition based on Convolutional Neural Networks with a Convolutional Auto-Encoder
TLDR
Type of deep model convolutional neural network (CNN) is proposed for HAR that can act directly on the raw inputs that compares favorably against state-of-the-art algorithms using hand-designed features. Expand
Fully convolutional networks for action recognition
TLDR
A novel two-stream fully convolutional networks architecture for action recognition which can significantly reduce parameters while keeping performance is designed and can achieve the state-of-the-art performance on two challenging datasets. Expand
3D CNN for Human Action Recognition
TLDR
This paper proposes a HAR approach based on a 3D CNN model, and applies the developed model to recognize human actions of KTH and J-HMDB datasets, and achieves state of the art performance in comparison to baseline methods. Expand
Action Recognition with Image Based CNN Features
TLDR
This paper presents a feature structure on top of fc7 features, which can capture the temporal variation in a video and introduces a method for extracting key-frames using binary coding of each frame in aVideo, which helps to improve the performance of the hierarchical model. Expand
Skeleton-Based Human Action Recognition Using Spatial Temporal 3D Convolutional Neural Networks
TLDR
This paper proposes a novel two-stream model using 3D CNN in the field of skeleton-based action recognition, which outperforms most of RNN-based methods, which verify the complementary property between spatial and temporal information and the robustness to noise. Expand
Improved two-streammodel for human action recognition
This paper addresses the recognitions of human actions in videos. Human action recognition can be seen as the automatic labeling of a video according to the actions occurring in it. It has become oneExpand
A Hybrid Deep Learning Architecture Using 3D CNNs and GRUs for Human Action Recognition
TLDR
This study proposes to use a stack of gated recurrent unit (GRU) layers on top of a two-stream inflated convolutional neural network to improve the classification accuracy of the HMDB51 challenging dataset. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 67 REFERENCES
Human Tracking Using Convolutional Neural Networks
TLDR
This paper treats tracking as a learning problem of estimating the location and the scale of an object given its previous location, scale, as well as current and previous image frames, and introduces multiple path ways in CNN to better fuse local and global information. Expand
Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis
TLDR
This paper presents an extension of the Independent Subspace Analysis algorithm to learn invariant spatio-temporal features from unlabeled video data and discovered that this method performs surprisingly well when combined with deep learning techniques such as stacking and convolution to learn hierarchical representations. Expand
Hidden Part Models for Human Action Recognition: Probabilistic versus Max Margin
  • Yang Wang, Greg Mori
  • Computer Science, Medicine
  • IEEE Transactions on Pattern Analysis and Machine Intelligence
  • 2011
TLDR
This work presents a discriminative part-based approach for human action recognition from video sequences using motion features based on the recently proposed hidden conditional random field (HCRF) for object recognition, and demonstrates that MMHCRF outperforms HCRF in humanaction recognition. Expand
Human action detection by boosting efficient motion features
TLDR
A novel action representation scheme using a set of motion edge history images is proposed, which not only encodes both shape and motion patterns of actions without relying on precise alignment of human figures, but also facilitates learning of fast tree-structured boosting classifiers. Expand
Learning realistic human actions from movies
TLDR
A new method for video classification that builds upon and extends several recent ideas including local space-time features,space-time pyramids and multi-channel non-linear SVMs is presented and shown to improve state-of-the-art results on the standard KTH action dataset. Expand
Detecting Human Actions in Surveillance Videos
TLDR
This notebook paper summarizes Team NEC-UIUC’s approaches for TRECVid 2009 Evaluation of Surveillance Event Detection by combining 3D convolutional neural networks (CNN) and SVM classifiers based on bag-ofwords local features to detect the presence of events of inte rests. Expand
Human Action Recognition Using a Modified Convolutional Neural Network
TLDR
A feature selection technique using the WFMM model to reduce the dimensionality of the feature space is introduced and two kinds of relevance factors between features and pattern classes are defined to analyze the salient features. Expand
Recognizing realistic actions from videos
TLDR
This paper presents a systematic framework for recognizing realistic actions from videos “in the wild”, and uses motion statistics to acquire stable motion features and clean static features, and PageRank is used to mine the most informative static features. Expand
Evaluation of Local Spatio-temporal Features for Action Recognition
TLDR
It is demonstrated that regular sampling of space-time features consistently outperforms all testedspace-time interest point detectors for human actions in realistic settings and is a consistent ranking for the majority of methods over different datasets. Expand
Convolutional Learning of Spatio-temporal Features
TLDR
A model that learns latent representations of image sequences from pairs of successive images is introduced, allowing it to scale to realistic image sizes whilst using a compact parametrization. Expand
...
1
2
3
4
5
...