Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset

@article{Carreira2017QuoVA,
  title={Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset},
  author={Jo{\~a}o Carreira and Andrew Zisserman},
  journal={2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2017},
  pages={4724-4733}
}
The paucity of videos in current action classification datasets (UCF-101 and HMDB-51) has made it difficult to identify good video architectures, as most methods obtain similar performance on existing small-scale benchmarks. This paper re-evaluates state-of-the-art architectures in light of the new Kinetics Human Action Video dataset. Kinetics has two orders of magnitude more data, with 400 human action classes and over 400 clips per class, and is collected from realistic, challenging YouTube… CONTINUE READING

Similar Papers

Citations

Publications citing this paper.
SHOWING 1-10 OF 454 CITATIONS, ESTIMATED 80% COVERAGE

Baidu-UTS Submission to the EPIC-Kitchens Action Recognition Challenge 2019

Xiaohan Wang, Yu Wu, Linchao Zhu, Yi Yang
  • ArXiv
  • 2019
VIEW 7 EXCERPTS
CITES METHODS & BACKGROUND
HIGHLY INFLUENCED

Human Activity Recognition for Edge Devices

VIEW 15 EXCERPTS
CITES METHODS & BACKGROUND
HIGHLY INFLUENCED

Video Action Recognition With an Additional End-to-End Trained Temporal Stream

  • 2019 IEEE Winter Conference on Applications of Computer Vision (WACV)
  • 2019
VIEW 14 EXCERPTS
CITES METHODS & RESULTS
HIGHLY INFLUENCED

Audio-Visual Scene Analysis with Self-Supervised Multisensory Features

VIEW 10 EXCERPTS
CITES RESULTS, BACKGROUND & METHODS
HIGHLY INFLUENCED

DIY Human Action Dataset Generation

  • 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)
  • 2018
VIEW 10 EXCERPTS
CITES BACKGROUND & METHODS
HIGHLY INFLUENCED

Deep RNN Framework for Visual Sequential Applications

VIEW 10 EXCERPTS
CITES METHODS & BACKGROUND
HIGHLY INFLUENCED

FILTER CITATIONS BY YEAR

2016
2019

CITATION STATISTICS

  • 177 Highly Influenced Citations

  • Averaged 151 Citations per year over the last 3 years

References

Publications referenced by this paper.
SHOWING 1-10 OF 37 REFERENCES

Learning Spatiotemporal Features with 3D Convolutional Networks

  • 2015 IEEE International Conference on Computer Vision (ICCV)
  • 2014
VIEW 11 EXCERPTS
HIGHLY INFLUENTIAL

The Kinetics Human Action Video Dataset

VIEW 4 EXCERPTS
HIGHLY INFLUENTIAL

and L

A. Karpathy, G. Toderici, S. Shetty, T. Leung, R. Sukthankar
  • Fei-Fei. Large-scale video classification with convolutional neural networks. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pages 1725–1732
  • 2014
VIEW 5 EXCERPTS
HIGHLY INFLUENTIAL

HMDB: A large video database for human motion recognition

  • 2011 International Conference on Computer Vision
  • 2011
VIEW 3 EXCERPTS
HIGHLY INFLUENTIAL

3D Convolutional Neural Networks for Human Action Recognition

  • IEEE Transactions on Pattern Analysis and Machine Intelligence
  • 2010
VIEW 3 EXCERPTS
HIGHLY INFLUENTIAL

Actions ~ Transformations

  • 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
  • 2016
VIEW 1 EXCERPT

Convolutional Two-Stream Network Fusion for Video Action Recognition

  • 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
  • 2016
VIEW 2 EXCERPTS