Efficient Spatialtemporal Context Modeling for Action Recognition

  author={Congqi Cao and Yue Lu and Yifan Zhang and Dongmei Jiang and Yanning Zhang},
Contextual information plays an important role in action recognition. Local operations have difficulty to model the relation between two elements with a long-distance interval. However, directly modeling the contextual information between any two points brings huge cost in computation and memory, especially for action recognition, where there is an additional temporal dimension. Inspired from 2D criss-cross attention used in segmentation task, we propose a recurrent 3D crisscross attention… 
