Corpus ID: 237485612

FineAction: A Fine-Grained Video Dataset for Temporal Action Localization

  title={FineAction: A Fine-Grained Video Dataset for Temporal Action Localization},
  author={Yi Liu and Limin Wang and Xiao Ma and Yali Wang and Yu Qiao},
  • Yi Liu, Limin Wang, +2 authors Yu Qiao
  • Published 2021
  • Computer Science
Temporal action localization (TAL) is an important and challenging problem in video understanding. However, most existing TAL benchmarks are built upon the coarse granularity of action classes, which exhibits two major limitations in this task. First, coarse-level actions can make the localization models overfit in high-level context information, and ignore the atomic action details in the video. Second, the coarse action classes often lead to the ambiguous annotations of temporal boundaries… Expand

Figures and Tables from this paper


FineGym: A Hierarchical Video Dataset for Fine-Grained Action Understanding
FineGym is a new dataset built on top of gymnasium videos that provides temporal annotations at both action and sub-action levels with a three-level semantic hierarchy and systematically investigates different methods on this dataset and obtains a number of interesting findings. Expand
MultiSports: A Multi-Person Video Dataset of Spatio-Temporally Localized Sports Actions
This paper aims to present a new multi-person dataset of spatio-temporal localized sports actions, coined as MultiSports, with important properties of strong diversity, detailed annotation, and high quality, and hopes it can serve as a standard benchmark for spatio/temporal action detection in the future. Expand
Temporal Action Localization in Untrimmed Videos via Multi-stage CNNs
A novel loss function for the localization network is proposed to explicitly consider temporal overlap and achieve high temporal localization accuracy in untrimmed long videos. Expand
AVA: A Video Dataset of Spatio-Temporally Localized Atomic Visual Actions
  • C. Gu, Chen Sun, +8 authors J. Malik
  • Computer Science
  • 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition
  • 2018
The AVA dataset densely annotates 80 atomic visual actions in 437 15-minute video clips, where actions are localized in space and time, resulting in 1.59M action labels with multiple labels per person occurring frequently. Expand
Temporal Action Detection with Structured Segment Networks
The structured segment network (SSN) is presented, a novel framework which models the temporal structure of each action instance via a structured temporal pyramid and introduces a decomposed discriminative model comprising two classifiers, respectively for classifying actions and determining completeness. Expand
Fast Learning of Temporal Action Proposal via Dense Boundary Generator
An efficient and unified framework to generate temporal action proposals named Dense Boundary Generator (DBG), which draws inspiration from boundary-sensitive methods and implements boundary classification and action completeness regression for densely distributed proposals. Expand
Intra- and Inter-Action Understanding via Temporal Action Parsing
This study shows that a sport activity usually consists of multiple sub-actions and that the awareness of such temporal structures is beneficial to action recognition, and investigates a number of temporal parsing methods, and devise an improved method that is capable of mining sub- actions from training data without knowing the labels of them. Expand
G-TAD: Sub-Graph Localization for Temporal Action Detection
This work proposes a graph convolutional network (GCN) model to adaptively incorporate multi-level semantic context into video features and cast temporal action detection as a sub-graph localization problem. Expand
BSN: Boundary Sensitive Network for Temporal Action Proposal Generation
An effective proposal generation method, named Boundary-Sensitive Network (BSN), which adopts "local to global" fashion and significantly improves the state-of-the-art temporal action detection performance. Expand
Temporal Segment Networks for Action Recognition in Videos
The proposed TSN framework, called temporal segment network (TSN), aims to model long-range temporal structure with a new segment-based sampling and aggregation scheme and won the video classification track at the ActivityNet challenge 2016 among 24 teams. Expand