VideoLSTM Convolves, Attends and Flows for Action Recognition

  title={VideoLSTM Convolves, Attends and Flows for Action Recognition},
  author={Zhenyang Li and Efstratios Gavves and Mihir Jain and Cees Snoek},
  journal={Computer Vision and Image Understanding},
We present VideoLSTM for end-to-end sequence learning of actions in video. Rather than adapting the video to the peculiarities of established recurrent or convolutional architectures, we adapt the architecture to fit the requirements of the video medium. Starting from the soft-Attention LSTM, VideoLSTM makes three novel contributions. First, video has a spatial layout. To exploit the spatial correlation we hardwire convolutions in the soft-Attention LSTM architecture. Second, motion not only… CONTINUE READING
Highly Cited
This paper has 80 citations. REVIEW CITATIONS
Recent Discussions
This paper has been referenced on Twitter 17 times over the past 90 days. VIEW TWEETS
53 Citations
53 References
Similar Papers


Publications citing this paper.
Showing 1-10 of 53 extracted citations

80 Citations

Citations per Year
Semantic Scholar estimates that this publication has 80 citations based on the available data.

See our FAQ for additional information.


Publications referenced by this paper.
Showing 1-10 of 53 references

Similar Papers

Loading similar papers…