Corpus ID: 234742617

MultiSports: A Multi-Person Video Dataset of Spatio-Temporally Localized Sports Actions

  title={MultiSports: A Multi-Person Video Dataset of Spatio-Temporally Localized Sports Actions},
  author={Yixuan Li and Lei Chen and Runyu He and Zhenzhi Wang and Gangshan Wu and Limin Wang},
Spatio-temporal action detection is an important and challenging problem in video understanding. The existing action detection benchmarks are limited in aspects of small numbers of instances in a trimmed video or relatively lowlevel atomic actions. This paper aims to present a new multi-person dataset of spatio-temporal localized sports actions, coined as MultiSports. We first analyze the important ingredients of constructing a realistic and challenging dataset for spatio-temporal action… Expand
FineAction: A Fine-Grained Video Dataset for Temporal Action Localization
  • Yi Liu, Limin Wang, Xiao Ma, Yali Wang, Yu Qiao
  • Computer Science
  • 2021
Temporal action localization (TAL) is an important and challenging problem in video understanding. However, most existing TAL benchmarks are built upon the coarse granularity of action classes, whichExpand


AVA: A Video Dataset of Spatio-Temporally Localized Atomic Visual Actions
  • C. Gu, Chen Sun, +8 authors J. Malik
  • Computer Science
  • 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition
  • 2018
The AVA dataset densely annotates 80 atomic visual actions in 437 15-minute video clips, where actions are localized in space and time, resulting in 1.59M action labels with multiple labels per person occurring frequently. Expand
MEVA: A Large-Scale Multiview, Multimodal Video Dataset for Activity Detection
This work presents the Multiview Extended Video with Activities (MEVA) dataset, a new and very-large-scale dataset for human activity recognition, scripted to include diverse, simultaneous activities, along with spontaneous background activity. Expand
STEP: Spatio-Temporal Progressive Learning for Video Action Detection
Compared to the prior work that performs action detection in one run, the proposed Spatio-TEmporal Progressive action detector is able to naturally handle the spatial displacement within action tubes and therefore provides a more effective way for spatio-temporal modeling. Expand
Temporal Action Detection with Structured Segment Networks
The structured segment network (SSN) is presented, a novel framework which models the temporal structure of each action instance via a structured temporal pyramid and introduces a decomposed discriminative model comprising two classifiers, respectively for classifying actions and determining completeness. Expand
Multi-Moments in Time: Learning and Interpreting Models for Multi-Action Video Understanding
This work augmented the existing video dataset, Moments in Time, to include over two million action labels for over one million three second videos, and introduces novel challenges on how to train and analyze models for multi-action detection. Expand
Online Real-Time Multiple Spatiotemporal Action Localisation and Prediction
This work presents a deep-learning framework for real-time multiple spatio-temporal (S/T) action localisation and classification that is not only capable of performing S/T detection in real time, but can also perform early action prediction in an online fashion. Expand
FineGym: A Hierarchical Video Dataset for Fine-Grained Action Understanding
FineGym is a new dataset built on top of gymnasium videos that provides temporal annotations at both action and sub-action levels with a three-level semantic hierarchy and systematically investigates different methods on this dataset and obtains a number of interesting findings. Expand
Multi-region Two-Stream R-CNN for Action Detection
A multi-region two-stream R-CNN model for action detection in realistic videos that links frame-level detections with the Viterbi algorithm, and temporally localize an action with the maximum subarray method is proposed. Expand
Context-Aware RCNN: A Baseline for Action Detection in Videos
This work empirically finds the recognition accuracy is highly correlated with the bounding box size of an actor, and thus higher resolution of actors contributes to better performance, and develops a surpringly effective baseline (Context-Aware RCNN) that achieves new state-of-the-art results on two challenging action detection benchmarks of AVA and JHMDB. Expand
Learning to Track for Spatio-Temporal Action Localization
The approach first detects proposals at the frame-level and scores them with a combination of static and motion CNN features, then tracks high-scoring proposals throughout the video using a tracking-by-detection approach that outperforms the state of the art with a margin of 15%, 7% and 12% respectively in mAP. Expand