Action Recognition with Improved Trajectories
Dense trajectories were shown to be an efficient video representation for action recognition and achieved state-of-the-art results on a variety of datasets are improved by taking into account camera motion to correct them.
Action recognition by dense trajectories
- Heng Wang, Alexander Kläser, C. Schmid, Cheng-Lin Liu
- Computer ScienceComputer Vision and Pattern Recognition
- 20 June 2011
This work introduces a novel descriptor based on motion boundary histograms, which is robust to camera motion and consistently outperforms other state-of-the-art descriptors, in particular in uncontrolled realistic videos.
Dense Trajectories and Motion Boundary Descriptors for Action Recognition
- Heng Wang, Alexander Kläser, C. Schmid, Cheng-Lin Liu
- Computer ScienceInternational Journal of Computer Vision
- 6 March 2013
The MBH descriptor shows to consistently outperform other state-of-the-art descriptors, in particular on real-world videos that contain a significant amount of camera motion.
A Closer Look at Spatiotemporal Convolutions for Action Recognition
- Du Tran, Heng Wang, L. Torresani, Jamie Ray, Yann LeCun, Manohar Paluri
- Computer ScienceIEEE/CVF Conference on Computer Vision and…
- 30 November 2017
A new spatiotemporal convolutional block "R(2+1)D" is designed which produces CNNs that achieve results comparable or superior to the state-of-the-art on Sports-1M, Kinetics, UCF101, and HMDB51.
Evaluation of Local Spatio-temporal Features for Action Recognition
- Heng Wang, M. M. Ullah, Alexander Kläser, I. Laptev, C. Schmid
- Computer ScienceBritish Machine Vision Conference
- 7 September 2009
It is demonstrated that regular sampling of space-time features consistently outperforms all testedspace-time interest point detectors for human actions in realistic settings and is a consistent ranking for the majority of methods over different datasets.
Is Space-Time Attention All You Need for Video Understanding?
- Gedas Bertasius, Heng Wang, L. Torresani
- Computer ScienceInternational Conference on Machine Learning
- 9 February 2021
This paper presents a convolution-free approach to video classification built exclusively on self-attention over space and time, and suggests that “divided attention,” where temporal attention and spatial attention are separately applied within each block, leads to the best video classification accuracy among the design choices considered.
Video Classification With Channel-Separated Convolutional Networks
- Du Tran, Heng Wang, L. Torresani, Matt Feiszli
- Computer ScienceIEEE International Conference on Computer Vision
- 4 April 2019
It is empirically demonstrated that the amount of channel interactions plays an important role in the accuracy of 3D group convolutional networks, and this leads to an architecture -- Channel-Separated Convolutional Network (CSN) -- which is simple, efficient, yet accurate.
A Robust and Efficient Video Representation for Action Recognition
- Heng Wang, Dan Oneaţă, J. Verbeek, C. Schmid
- Computer ScienceInternational Journal of Computer Vision
- 21 April 2015
It is found that the improved trajectory features significantly outperform previous dense trajectories, and that Fisher vectors are superior to BOW encodings for video recognition tasks.
Large-Scale Weakly-Supervised Pre-Training for Video Action Recognition
- Deepti Ghadiyaram, Matt Feiszli, Du Tran, Xueting Yan, Heng Wang, D. Mahajan
- Computer ScienceComputer Vision and Pattern Recognition
- 2 May 2019
The primary empirical finding is that pre-training at a very large scale (over 65 million videos), despite on noisy social-media videos and hashtags, substantially improves the state-of-the-art on three challenging public action recognition datasets.
Video Modeling With Correlation Networks
- Heng Wang, Du Tran, L. Torresani, Matt Feiszli
- Computer ScienceComputer Vision and Pattern Recognition
- 7 June 2019
This paper proposes an alternative approach based on a learnable correlation operator that can be used to establish frame-to-frame matches over convolutional feature maps in the different layers of the network.
...
...