Attentive Spatio-Temporal Representation Learning for Diving Classification

@article{Kanojia2019AttentiveSR,
  title={Attentive Spatio-Temporal Representation Learning for Diving Classification},
  author={Gagan Kanojia and Sudhakar Kumawat and Shanmuganathan Raman},
  journal={2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)},
  year={2019},
  pages={2467-2476}
}
Competitive diving is a well recognized aquatic sport in which a person dives from a platform or a springboard into the water. Based on the acrobatics performed during the dive, diving is classified into a finite set of action classes which are standardized by FINA. In this work, we propose an attention guided LSTM-based neural network architecture for the task of diving classification. The network takes the frames of a diving video as input and determines its class. We evaluate the performance… Expand
Gate-Shift Networks for Video Action Recognition
TLDR
An extensive evaluation of the proposed Gate-Shift Module is performed to study its effectiveness in video action recognition, achieving state-of-the-art results on Something Something-V1 and Diving48 datasets, and obtaining competitive results on EPIC-Kitchens with far less model complexity. Expand
Depthwise Spatio-Temporal STFT Convolutional Neural Networks for Human Action Recognition
TLDR
STFT blocks based 3D CNNs achieve on par or even better performance compared to the state-of-the-art methods, and their feature learning capabilities are significantly better than the conventional 3D convolutional layer and its variants. Expand
EAN: Event Adaptive Network for Enhanced Action Recognition
  • Yuan Tian, Yichao Yan, +4 authors Zhiyong Gao
  • Computer Science
  • ArXiv
  • 2021
TLDR
A unified action recognition framework to investigate the dynamic nature of video content by introducing the following designs, which are adaptive to the input video content and a novel and efficient Latent Motion Code module, further improving the performance of the framework. Expand
Metric-Based Attention Feature Learning for Video Action Recognition
TLDR
A novel attention module aiming at only action part (s), while neglecting non-action part(s) such as background is proposed to enhance the feature representation ability for action recognition. Expand
Learning Self-Similarity in Space and Time as Generalized Motion for Video Action Recognition
TLDR
A rich and robust motion representation based on spatio-temporal self-similarity (STSS), which effectively captures long-term interaction and fast motion in the video, leading to robust action recognition. Expand
HFNet: A Novel Model for Human Focused Sports Action Recognition
TLDR
A novel model to construct visual relationships in images through graph convolutions that is able to pay attention to the changes and details of body parts and achieves start-of-the-art performance on complex human-focused sports datasets FSD-10 and Diving48. Expand
Adaptive Recursive Circle Framework for Fine-grained Action Recognition
  • Hanxi Lin, Xinxiao Wu, Jiebo Luo
  • Computer Science
  • ArXiv
  • 2021
TLDR
An Adaptive Recursive Circle (ARC) framework is proposed, a fine-grained decorator for pure feedforward layers that can facilitate fine- grained action recognition by introducing deeply refined features and multi-scale receptive fields at a low cost. Expand
SportsCap: Monocular 3D Human Motion Capture and Fine-grained Understanding in Challenging Sports Videos
TLDR
This paper proposes SportsCap – the first approach for simultaneously capturing 3D human motions and understanding fine-grained actions from monocular challenging sports video input and introduces a multistream spatial-temporal Graph Convolutional Network(STGCN) to predict the fine- grained semantic action attributes. Expand
Temporal Query Networks for Fine-grained Video Understanding
TLDR
A new model is proposed—a Temporal Query Network—which enables the query-response functionality, and a structural understanding of fine-grained actions in untrimmed videos, and is compared to other architectures and text supervision methods, and analyzed their pros and cons. Expand
Video Modeling With Correlation Networks
TLDR
This paper proposes an alternative approach based on a learnable correlation operator that can be used to establish frame-to-frame matches over convolutional feature maps in the different layers of the network. Expand
...
1
2
...

References

SHOWING 1-10 OF 32 REFERENCES
Temporal Segment Networks: Towards Good Practices for Deep Action Recognition
Deep convolutional networks have achieved great success for visual recognition in still images. However, for action recognition in videos, the advantage over traditional methods is not so evident.Expand
Convolutional Neural Networks Based Ball Detection in Tennis Games
TLDR
An innovative deep learning approach to the identification of the ball in tennis context is presented, exploiting the potential of a convolutional neural network classifier to decide whether a ball is being observed in a single frame, overcoming the typical issues that can occur dealing with classical approaches on long video sequences. Expand
Hockey Action Recognition via Integrated Stacked Hourglass Network
TLDR
Experimental results show action recognition accuracy of 65% for four types of actions in hockey and when similar poses are merged to three and two classes, the accuracy rate increases to 71% and 78%, proving the efficacy of the methodology for automated action recognition in hockey. Expand
Jersey Number Recognition with Semi-Supervised Spatial Transformer Network
TLDR
This work improves the former network to an end-to-end framework by fusing with the spatial transformer network (STN) and upgrades the model to a semi-supervised multi-task learning system, by labeling a small portion of the number areas in the dataset by quadrangle. Expand
Large-Scale Video Classification with Convolutional Neural Networks
TLDR
This work studies multiple approaches for extending the connectivity of a CNN in time domain to take advantage of local spatio-temporal information and suggests a multiresolution, foveated architecture as a promising way of speeding up the training. Expand
Spatio-Temporal LSTM with Trust Gates for 3D Human Action Recognition
TLDR
This paper introduces new gating mechanism within LSTM to learn the reliability of the sequential input data and accordingly adjust its effect on updating the long-term context information stored in the memory cell, and proposes a more powerful tree-structure based traversal method. Expand
The Kinetics Human Action Video Dataset
TLDR
The dataset is described, the statistics are described, how it was collected, and some baseline performance figures for neural network architectures trained and tested for human action classification on this dataset are given. Expand
Differential Recurrent Neural Networks for Action Recognition
TLDR
This study proposes a differential gating scheme for the LSTM neural network, which emphasizes on the change in information gain caused by the salient motions between the successive frames, and thus the model is termed as differential Recurrent Neural Network (dRNN). Expand
A Closer Look at Spatiotemporal Convolutions for Action Recognition
TLDR
A new spatiotemporal convolutional block "R(2+1)D" is designed which produces CNNs that achieve results comparable or superior to the state-of-the-art on Sports-1M, Kinetics, UCF101, and HMDB51. Expand
Learning Spatiotemporal Features with 3D Convolutional Networks
TLDR
The learned features, namely C3D (Convolutional 3D), with a simple linear classifier outperform state-of-the-art methods on 4 different benchmarks and are comparable with current best methods on the other 2 benchmarks. Expand
...
1
2
3
4
...