A Key Volume Mining Deep Framework for Action Recognition

@article{Zhu2016AKV,
  title={A Key Volume Mining Deep Framework for Action Recognition},
  author={Wangjiang Zhu and Jie Hu and Gang Sun and Xudong Cao and Yu Qiao},
  journal={2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2016},
  pages={1991-1999}
}
  • Wangjiang Zhu, Jie Hu, +2 authors Y. Qiao
  • Published 27 June 2016
  • Computer Science
  • 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Recently, deep learning approaches have demonstrated remarkable progresses for action recognition in videos. [...] Key Method Specifically, our framework is trained is optimized in an alternative way integrated to the forward and backward stages of Stochastic Gradient Descent (SGD). In the forward pass, our network mines key volumes for each action class. In the backward pass, it updates network parameters with the help of these mined key volumes.Expand
Sequential Segment Networks for Action Recognition
TLDR
This work proposes a deep learning framework, sequential segment networks (SSN), to model video-level temporal structures in videos, and achieves state-of-the-art performance on UCF101 and HMDB51 datasets.
Improving human action recognitionby temporal attention
TLDR
A temporal attention model which learns to recognize human actions in videos while focusing selectively on the informative frames, which consistently improves on no-attention methods, with both RGB and optical flow based deep ConvNets.
Attention-Aware Sampling via Deep Reinforcement Learning for Action Recognition
TLDR
An attentionaware sampling method for action recognition, which aims to discard the irrelevant and misleading frames and preserve the most discriminative frames, which can be applied to different existing deep learning based action recognition models.
Temporal Segment Networks: Towards Good Practices for Deep Action Recognition
Deep convolutional networks have achieved great success for visual recognition in still images. However, for action recognition in videos, the advantage over traditional methods is not so evident.
Moving Foreground-Aware Visual Attention and Key Volume Mining for Human Action Recognition
TLDR
A novel deep model called Moving Foreground Attention (MFA) is proposed that enhances the performance of action recognition by guiding the model to focus on the discriminative foreground targets.
Deep Moving Poselets for Video Based Action Recognition
TLDR
A new approach to action classification in video, which uses deep appearance and motion features extracted from spatio-temporal volumes defined along body part trajectories to learn mid-level classifiers called deep moving poselets, achieves state-of-the-art performance on the popular and challenging sub-JHMDB and MSR Daily Activity datasets.
Going deeper with two-stream ConvNets for action recognition in video surveillance
TLDR
A novel deeper two-stream ConvNets has been designed for the learning of action complexity and with a dis-order strategy of training/testing video sets, the proposed model and learning strategy are able to collaboratively achieve a significant improvement of action recognition.
A novel recurrent hybrid network for feature fusion in action recognition
TLDR
A recurrent hybrid network architecture is designed for action recognition by fusing multi-source features: a two-stream CNNs for learning semantic features, a three-stream single-layer LSTM for learning long-term temporal feature, and an Improved Dense Trajectories stream for learning short-termporal motion feature.
A Comprehensive Study of Deep Video Action Recognition
TLDR
A comprehensive survey of over 200 existing papers on deep learning for video action recognition is provided, starting with early attempts at adapting deep learning, then to the two-stream networks, followed by the adoption of 3D convolutional kernels, and finally to the recent compute-efficient models.
Deep Learning for Action and Gesture Recognition in Image Sequences: A Survey
TLDR
This chapter is a survey of current deep learning based methodologies for action and gesture recognition in sequences of images, and introduces a taxonomy that summarizes important aspects of deep learning for approaching both tasks.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 40 REFERENCES
Learning Discriminative Space–Time Action Parts from Weakly Labelled Videos
TLDR
By using local space–time action parts in a weakly supervised setting, the proposed local deformable spatial bag-of-features in which local discriminative regions are split into a fixed grid of parts that are allowed to deform in both space and time at test-time are demonstrated.
Two-Stream Convolutional Networks for Action Recognition in Videos
TLDR
This work proposes a two-stream ConvNet architecture which incorporates spatial and temporal networks and demonstrates that a ConvNet trained on multi-frame dense optical flow is able to achieve very good performance in spite of limited training data.
Modeling Spatial-Temporal Clues in a Hybrid Deep Learning Framework for Video Classification
TLDR
This work proposes a hybrid deep learning framework for video classification, which is able to model static spatial information, short-term motion, as well as long-term temporal clues in the videos, and achieves very competitive performance on two popular and challenging benchmarks.
Action recognition with trajectory-pooled deep-convolutional descriptors
TLDR
This paper presents a new video representation, called trajectory-pooled deep-convolutional descriptor (TDD), which shares the merits of both hand-crafted features and deep-learned features, and achieves superior performance to the state of the art on these datasets.
Beyond short snippets: Deep networks for video classification
TLDR
This work proposes and evaluates several deep neural network architectures to combine image information across a video over longer time periods than previously attempted, and proposes two methods capable of handling full length videos.
Less Is More: Video Trimming for Action Recognition
TLDR
A method for learning a subsequence classifier which can detect and classify part of a video that corresponds to the action and a favorable performance of the subsequenceclassifier for temporal localization of actions in videos is evidenced on two categories of the Hollywood2 dataset.
Towards Good Practices for Very Deep Two-Stream ConvNets
TLDR
This report presents very deep two-stream ConvNets for action recognition, by adapting recent very deep architectures into video domain, and extends the Caffe toolbox into Multi-GPU implementation with high computational efficiency and low memory consumption.
3D Convolutional Neural Networks for Human Action Recognition
TLDR
A novel 3D CNN model for action recognition that extracts features from both the spatial and the temporal dimensions by performing 3D convolutions, thereby capturing the motion information encoded in multiple adjacent frames.
Large-Scale Video Classification with Convolutional Neural Networks
TLDR
This work studies multiple approaches for extending the connectivity of a CNN in time domain to take advantage of local spatio-temporal information and suggests a multiresolution, foveated architecture as a promising way of speeding up the training.
Unsupervised Learning of Video Representations using LSTMs
TLDR
This work uses Long Short Term Memory networks to learn representations of video sequences and evaluates the representations by finetuning them for a supervised learning problem - human action recognition on the UCF-101 and HMDB-51 datasets.
...
1
2
3
4
...