Rethinking Spatiotemporal Feature Learning For Video Understanding

  title={Rethinking Spatiotemporal Feature Learning For Video Understanding},
  author={Saining Xie and Chen Sun and Jonathan Huang and Zhuowen Tu and Kevin Murphy},
In this paper we study 3D convolutional networks for video understanding tasks. Our starting point is the stateof-the-art I3D model of [3], which “inflates” all the 2D filters of the Inception architecture to 3D. We first consider “deflating” the I3D model at various levels to understand the role of 3D convolutions. Interestingly, we found that 3D convolutions at the top layers of the network contribute more than 3D convolutions at the bottom layers, while also being computationally more… CONTINUE READING
12 Citations
63 References
Similar Papers


Publications citing this paper.


Publications referenced by this paper.
Showing 1-10 of 63 references

Quo vadis

  • J. Carreira, A. Zisserman
  • action recognition? a new model and the kinetics…
  • 2017
Highly Influential
8 Excerpts

Appearance-andrelation networks for video classification

  • L. Wang, W. Li, L. V. Gool
  • arXiv preprint arXiv:1711.09125
  • 2017
1 Excerpt

Similar Papers

Loading similar papers…