Lattice Long Short-Term Memory for Human Action Recognition
@article{Sun2017LatticeLS, title={Lattice Long Short-Term Memory for Human Action Recognition}, author={Lin Sun and Kui Jia and Kevin Chen and Dit-Yan Yeung and Bertram E. Shi and Silvio Savarese}, journal={2017 IEEE International Conference on Computer Vision (ICCV)}, year={2017}, pages={2166-2175} }
Human actions captured in video sequences are threedimensional signals characterizing visual appearance and motion dynamics. [] Key Method Additionally, we introduce a novel multi-modal training procedure for training our network.
Figures and Tables from this paper
113 Citations
Memory-Augmented Temporal Dynamic Learning for Action Recognition
- Computer ScienceAAAI
- 2019
This work proposes a memory-augmented temporal dynamic learning network, which learns to write the most evident information into an external memory module and ignore irrelevant ones, and presents a differential memory controller to make a discrete decision on whether the external memory modules should be updated with current feature.
Recurrent Spatiotemporal Feature Learning for Action Recognition
- Computer ScienceICRAI 2018
- 2018
The proposed architecture is end-to-end trainable, and of significant flexibility to be adapted in any CNN-based structure, and produces the state-of-the-art performance on two standard benchmark for action recognition over RNN-based approaches.
Relational Long Short-Term Memory for Video Action Recognition
- Computer ScienceArXiv
- 2018
This paper presents a new variant of Long Short-Term Memory, namely Relational LSTM, to address the challenge of relation reasoning across space and time between objects, and proposes a two-branch neural architecture consisting of the RelationalLSTM module as the non-local branch and a spatio-temporal pooling based local branch.
Action Recognition Based on Linear Dynamical Systems with Deep Features in Videos
- Computer Science2020 IEEE International Conference on Systems, Man, and Cybernetics (SMC)
- 2020
Experimental results show that proposed framework simultaneously expresses spatial and temporal structures, which in turn produce state-of-the-art results.
Temporal Action Localization Using Long Short-Term Dependency
- Computer ScienceIEEE Transactions on Multimedia
- 2021
A novel method, referred to as the Gemini Network, is developed for effective modeling of temporal structures and achieving high-performance temporal action localization on two challenging datasets, namely, THUMOS14 and ActivityNet.
Attend It Again: Recurrent Attention Convolutional Neural Network for Action Recognition
- Computer Science
- 2018
This study improves the performance of recurrent attention convolutional neural network (RACNN) by proposing a novel model, “attention-again”, which is a variant from traditional attention model for recognizing human activities and is embedded in two long short-term memory (LSTM) layers.
Temporal Segment Connection Network for Action Recognition
- Computer ScienceIEEE Access
- 2020
The proposed temporal segment connection network can effectively improve the utilization rate of temporal information and the ability of overall action representation, thus significantly improves the accuracy of human action recognition.
A motion-aware ConvLSTM network for action recognition
- Computer ScienceApplied Intelligence
- 2018
A spatio-temporal video recognition network where a motion-aware long short-term memory module is introduced to estimate the motion flow along with extracting spatio/temporal features and a specific optical flow estimator is subsumed which is based on kernelized cross correlation.
Multi-stream Convolutional Neural Networks for Action Recognition in Video Sequences Based on Adaptive Visual Rhythms
- Computer Science2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA)
- 2018
A multi-stream network is the architecture of choice to incorporate temporal information, since it may benefit from pre-trained deep networks for images and from handcrafted features for initialization, and its training cost is usually lower than video-based networks.
References
SHOWING 1-10 OF 46 REFERENCES
Long-term recurrent convolutional networks for visual recognition and description
- Computer Science2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
- 2015
A novel recurrent convolutional architecture suitable for large-scale visual learning which is end-to-end trainable, and shows such models have distinct advantages over state-of-the-art models for recognition or generation which are separately defined and/or optimized.
Spatiotemporal Residual Networks for Video Action Recognition
- Computer ScienceNIPS
- 2016
The novel spatiotemporal ResNet is introduced and evaluated using two widely used action recognition benchmarks where it exceeds the previous state-of-the-art.
Human Action Recognition Using Factorized Spatio-Temporal Convolutional Networks
- Computer Science2015 IEEE International Conference on Computer Vision (ICCV)
- 2015
Factorized spatio-temporal convolutional networks (FstCN) are proposed that factorize the original 3D convolution kernel learning as a sequential process of learning 2D spatial kernels in the lower layers, followed by learning 1D temporal kernel in the upper layers.
Regularizing Long Short Term Memory with 3D Human-Skeleton Sequences for Action Recognition
- Computer Science2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
- 2016
This paper argues that large-scale action recognition in video can be greatly improved by providing an additional modality in training data - namely, 3D human-skeleton sequences - aimed at…
Temporal Segment Networks: Towards Good Practices for Deep Action Recognition
- Computer ScienceECCV
- 2016
Deep convolutional networks have achieved great success for visual recognition in still images. However, for action recognition in videos, the advantage over traditional methods is not so evident.…
Unsupervised Learning of Video Representations using LSTMs
- Computer ScienceICML
- 2015
This work uses Long Short Term Memory networks to learn representations of video sequences and evaluates the representations by finetuning them for a supervised learning problem - human action recognition on the UCF-101 and HMDB-51 datasets.
Action Recognition using Visual Attention
- Computer ScienceNIPS 2015
- 2015
A soft attention based model using multi-layered Recurrent Neural Networks with Long Short-Term Memory units which are deep both spatially and temporally for action recognition in videos.
VideoLSTM convolves, attends and flows for action recognition
- Computer ScienceComput. Vis. Image Underst.
- 2018
Two-Stream Convolutional Networks for Action Recognition in Videos
- Computer ScienceNIPS
- 2014
This work proposes a two-stream ConvNet architecture which incorporates spatial and temporal networks and demonstrates that a ConvNet trained on multi-frame dense optical flow is able to achieve very good performance in spite of limited training data.
Convolutional Two-Stream Network Fusion for Video Action Recognition
- Computer Science2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
- 2016
A new ConvNet architecture for spatiotemporal fusion of video snippets is proposed, and its performance on standard benchmarks where this architecture achieves state-of-the-art results is evaluated.