Corpus ID: 220713227

AttentionNAS: Spatiotemporal Attention Cell Search for Video Classification

@article{Wang2020AttentionNASSA,
  title={AttentionNAS: Spatiotemporal Attention Cell Search for Video Classification},
  author={Xiaofang Wang and Xuehan Xiong and M. Neumann and A. Piergiovanni and M. Ryoo and A. Angelova and Kris M. Kitani and W. Hua},
  journal={ArXiv},
  year={2020},
  volume={abs/2007.12034}
}
  • Xiaofang Wang, Xuehan Xiong, +5 authors W. Hua
  • Published 2020
  • Computer Science, Engineering
  • ArXiv
  • Convolutional operations have two limitations: (1) do not explicitly model where to focus as the same filter is applied to all the positions, and (2) are unsuitable for modeling long-range dependencies as they only operate on a small neighborhood. While both limitations can be alleviated by attention operations, many design choices remain to be determined to use attention, especially when applying attention to videos. Towards a principled way of applying attention to videos, we address the task… CONTINUE READING

    Figures, Tables, and Topics from this paper.

    Explore Further: Topics Discussed in This Paper

    Searching Multi-Rate and Multi-Modal Temporal Enhanced Networks for Gesture Recognition

    References

    Publications referenced by this paper.
    SHOWING 1-10 OF 44 REFERENCES
    Attention Augmented Convolutional Networks
    • 94
    • PDF
    Temporal Segment Networks: Towards Good Practices for Deep Action Recognition
    • 1,479
    • PDF
    Rethinking Spatiotemporal Feature Learning: Speed-Accuracy Trade-offs in Video Classification
    • 283
    • Highly Influential
    • PDF
    AssembleNet: Searching for Multi-Stream Neural Connectivity in Video Architectures
    • 17
    • PDF
    Convolutional Two-Stream Network Fusion for Video Action Recognition
    • 1,340
    • PDF
    BAM: Bottleneck Attention Module
    • 98
    • PDF
    Two-Stream Convolutional Networks for Action Recognition in Videos
    • 3,964
    • PDF