COMPOSER: Compositional Reasoning of Group Activity in Videos with Keypoint-Only Modality

  title={COMPOSER: Compositional Reasoning of Group Activity in Videos with Keypoint-Only Modality},
  author={Honglu Zhou and Asim Kadav and Aviv Shamsian and Shijie Geng and Farley Lai and Long Zhao and Tingxi Liu and Mubbasir Kapadia and Hans Peter Graf},
  booktitle={European Conference on Computer Vision},
. Group Activity Recognition detects the activity collectively performed by a group of actors, which requires compositional reasoning of actors and objects. We approach the task by modeling the video as tokens that represent the multi-scale semantic concepts in the video. We propose COMPOSER , a Multiscale Transformer based architecture that performs attention-based reasoning over tokens at each scale and learns group activity compositionally . In addition, prior works suffer from scene biases… 



RIT-18: A Novel Dataset for Compositional Group Activity Understanding

This paper proposes a new largescale untrimmed compositional group activity dataset RIT-18 based on the volleyball games captured from YouTube, and describes group activity recognition, future activity anticipation, and rally-level winner prediction challenges, and evaluates several baseline methods over these challenges.

stagNet: An Attentive Semantic RNN for Group Activity Recognition

A novel attentive semantic recurrent neural network (RNN) for understanding group activities in videos, dubbed as stagNet, is proposed, based on the spatio-temporal attention and semantic graph, and adopted to attend to key persons/frames for improved performance.

SAFCAR: Structured Attention Fusion for Compositional Action Recognition

A novel Structured Attention Fusion (SAF) self-attention mechanism is developed and test to combine information from object detections, which capture the time-series structure of an action, with visual cues that capture contextual information.

Home Action Genome: Cooperative Compositional Action Understanding

HOMAGE is introduced: a multi-view action dataset with multiple modalities and view-points supplemented with hierarchical activity and atomic action labels together with dense scene composition labels and Cooperative Compositional Action Understanding (CCAU), a cooperative learning framework for hierarchical action recognition that is aware of compositional action elements.

Skeleton-based Relational Reasoning for Group Activity Analysis

Learning Visual Context for Group Activity Recognition

This paper proposes a new reasoning paradigm to incorporate global contextual information, Transformer based Context Encoding (TCE) module, which enhances individual representation by encodingglobal contextual information to individual features and refining the aggregated information.

Progressive Relation Learning for Group Activity Recognition

A novel method based on deep reinforcement learning to progressively refine the low-level features and high-level relations of group activities and construct a semantic relation graph (SRG) to explicitly model the relations among persons.

Actor-Transformers for Group Activity Recognition

This paper proposes an actor-transformer model able to learn and selectively extract information relevant for group activity recognition, and achieves state-of-the-art results on two publicly available benchmarks for Group activity recognition.

Learning Actor Relation Graphs for Group Activity Recognition

This paper proposes to build a flexible and efficient Actor Relation Graph (ARG) to simultaneously capture the appearance and position relation between actors, and performs extensive experiments on two standard group activity recognition datasets.

Spatio-temporal attention mechanisms based model for collective activity recognition