Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset

  title={Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset},
  author={J. Carreira and Andrew Zisserman},
  journal={2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
  • J. Carreira, Andrew Zisserman
  • Published 2017
  • Computer Science
  • 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
  • The paucity of videos in current action classification datasets (UCF-101 and HMDB-51) has made it difficult to identify good video architectures, as most methods obtain similar performance on existing small-scale benchmarks. [...] Key Method We also introduce a new Two-Stream Inflated 3D ConvNet (I3D) that is based on 2D ConvNet inflation: filters and pooling kernels of very deep image classification ConvNets are expanded into 3D, making it possible to learn seamless spatio-temporal feature extractors from…Expand Abstract
    1,933 Citations
    FSD-10: A fine-grained classification dataset for figure skating
    • 1
    Improved two-stream model for human action recognition
    • 1
    Evaluating the Feasibility of Deep Learning for Action Recognition in Small Datasets
    • 2
    Reversing Two-Stream Networks with Decoding Discrepancy Penalty for Robust Action Recognition
    Temporal Segment Networks for Action Recognition in Videos
    • 136
    • Highly Influenced
    • PDF
    Action Machine: Rethinking Action Recognition in Trimmed Videos
    • 9
    • PDF
    Learn to cycle: Time-consistent feature discovery for action recognition
    • 1
    • PDF
    Multi-Task Learning of Generalizable Representations for Video Action Recognition
    Rethinking Zero-Shot Video Classification: End-to-End Training for Realistic Applications
    • 1
    • PDF


    Two-Stream Convolutional Networks for Action Recognition in Videos
    • 4,089
    • PDF
    Learning realistic human actions from movies
    • 3,378
    • PDF
    The Kinetics Human Action Video Dataset
    • 919
    • PDF
    Long-Term Temporal Convolutions for Action Recognition
    • 526
    • PDF
    Large-Scale Video Classification with Convolutional Neural Networks
    • 4,174
    • PDF
    Convolutional Two-Stream Network Fusion for Video Action Recognition
    • 1,402
    • PDF
    VideoLSTM convolves, attends and flows for action recognition
    • 236
    • PDF
    Beyond short snippets: Deep networks for video classification
    • 1,561
    • PDF
    UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild
    • 2,592
    • PDF