• Corpus ID: 232307845

Efficient Spatialtemporal Context Modeling for Action Recognition

@article{Cao2021EfficientSC,
  title={Efficient Spatialtemporal Context Modeling for Action Recognition},
  author={Congqi Cao and Yue Lu and Yifan Zhang and Dongmei Jiang and Yanning Zhang},
  journal={ArXiv},
  year={2021},
  volume={abs/2103.11190}
}
Contextual information plays an important role in action recognition. Local operations have difficulty to model the relation between two elements with a long-distance interval. However, directly modeling the contextual information between any two points brings huge cost in computation and memory, especially for action recognition, where there is an additional temporal dimension. Inspired from 2D criss-cross attention used in segmentation task, we propose a recurrent 3D crisscross attention… 
1 Citations

LIGAR: Lightweight General-purpose Action Recognition

The induced label noise problem is formulated and Adaptive Clip Selection (ACS) framework is proposed to deal with it and together it makes the LIGAR framework the general-purpose action recognition solution.

References

SHOWING 1-10 OF 58 REFERENCES

Action recognition and localization with spatial and temporal contexts

stagNet: An Attentive Semantic RNN for Group Activity and Individual Action Recognition

A novel attentive semantic recurrent neural network (RNN), namely, stagNet, is presented for understanding group activities and individual actions in videos, by combining the spatio-temporal attention mechanism and semantic graph modeling.

Weakly Semantic Guided Action Recognition

This paper proposes three simple but effective weakly semantic guided modules (SGMs) for both environment-constrained and cross-domain action recognition, composed of total 3-D convolution and element-wise gated operations; thus, they are efficient and easy to implement.

Spatial-temporal pyramid based Convolutional Neural Network for action recognition

Temporal–Spatial Mapping for Action Recognition

This work introduces a simple yet effective operation, termed temporal–spatial mapping, for capturing the temporal evolution of the frames by jointly analyzing all the frames of a video, and proposes a temporal attention model within a shallow convolutional neural network to efficiently exploit the temporal-spatial dynamics.

Convolutional relation network for skeleton-based action recognition

Temporal Segment Networks for Action Recognition in Videos

The proposed TSN framework, called temporal segment network (TSN), aims to model long-range temporal structure with a new segment-based sampling and aggregation scheme and won the video classification track at the ActivityNet challenge 2016 among 24 teams.

CCNet: Criss-Cross Attention for Semantic Segmentation

  • Zilong HuangXinggang Wang Wenyu Liu
  • Computer Science
    2019 IEEE/CVF International Conference on Computer Vision (ICCV)
  • 2019
This work proposes a Criss-Cross Network (CCNet) for obtaining contextual information in a more effective and efficient way and achieves the mIoU score of 81.4 and 45.22 on Cityscapes test set and ADE20K validation set, respectively, which are the new state-of-the-art results.

Action Recognition With Spatio–Temporal Visual Attention on Skeleton Image Sequences

This work redesigned the skeleton representations with a depth-first tree traversal order, which enhanced the semantic meaning of skeleton images and better preserved the associated structural information, and proposed a general two-branch attention architecture that automatically focused on spatio–temporal key stages and filtered out unreliable joint predictions.
...