ActionVLAD: Learning Spatio-Temporal Aggregation for Action Classification

  title={ActionVLAD: Learning Spatio-Temporal Aggregation for Action Classification},
  author={Rohit Girdhar and Deva Ramanan and Abhinav Gupta and Josef Sivic and Bryan C. Russell},
  journal={2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
In this work, we introduce a new video representation for action classification that aggregates local convolutional features across the entire spatio-temporal extent of the video. We do so by integrating state-of-the-art two-stream networks [42] with learnable spatio-temporal feature aggregation [6]. The resulting architecture is end-to-end trainable for whole-video classification. We investigate different strategies for pooling across space and time and combining signals from the different… CONTINUE READING
Highly Cited
This paper has 58 citations. REVIEW CITATIONS
Recent Discussions
This paper has been referenced on Twitter 9 times over the past 90 days. VIEW TWEETS

From This Paper

Topics from this paper.
45 Citations
64 References
Similar Papers


Publications citing this paper.
Showing 1-10 of 45 extracted citations

58 Citations

Citations per Year
Semantic Scholar estimates that this publication has 58 citations based on the available data.

See our FAQ for additional information.


Publications referenced by this paper.
Showing 1-10 of 64 references

Similar Papers

Loading similar papers…