Corpus ID: 235457994

BABEL: Bodies, Action and Behavior with English Labels

@inproceedings{Punnakkal2021BABELBA,
  title={BABEL: Bodies, Action and Behavior with English Labels},
  author={Abhinanda R. Punnakkal and Arjun Chandrasekaran and Nikos Athanasiou and Alejandra Quiros-Ramirez and Michael J. Black Max Planck Institute for Intelligent Systems and Universit{\"a}t Konstanz},
  booktitle={CVPR},
  year={2021}
}
Understanding the semantics of human movement – the what, how and why of the movement – is an important problem that requires datasets of human actions with semantic labels. Existing datasets take one of two approaches. Large-scale video datasets contain many action labels but do not contain ground-truth 3D human motion. Alternatively, motion-capture (mocap) datasets have precise body motions but are limited to a small number of actions. To address this, we present BABEL, a large dataset with… Expand
1 Citations

Figures and Tables from this paper

Human Action Recognition from Various Data Modalities: A Review
TLDR
This paper reviews both the hand-crafted feature-based and deep learning-based methods for single data modalities and also the methods based on multiple modalities, including the fusion-based frameworks and the co-learning-based approaches for HAR. Expand

References

SHOWING 1-10 OF 57 REFERENCES
Action2Motion: Conditioned Generation of 3D Human Motions
TLDR
This paper aims to generate plausible human motion sequences in 3D given a prescribed action type, and proposes a temporal Variational Auto-Encoder (VAE) that encourages a diverse sampling of the motion space. Expand
Watch-n-patch: Unsupervised understanding of actions and relations
TLDR
The model learns the high-level action co-occurrence and temporal relations between the actions in the activity video and is applied to unsupervised action segmentation and recognition, and also to a novel application that detects forgotten actions, which is called action patching. Expand
Benchmarking Search and Annotation in Continuous Human Skeleton Sequences
TLDR
A new large-scale LSMB19 dataset consisting of two 3D skeleton sequences of a total length of 54.5 hours is introduced and a benchmark on two important multimedia retrieval operations: subsequence search and annotation is defined. Expand
Going deeper into action recognition: A survey
TLDR
This survey provides a comprehensive review of the notable steps taken towards recognizing human actions, starting with the pioneering methods that use handcrafted representations, and then, navigating into the realm of deep learning based approaches. Expand
HACS: Human Action Clips and Segments Dataset for Recognition and Temporal Localization
  • Hang Zhao, Zhicheng Yan, Heng Wang, Lorenzo Torresani
  • Computer Science
  • 2019 IEEE/CVF International Conference on Computer Vision (ICCV)
  • 2019
TLDR
On HACS Segments, the state-of-the-art methods of action proposal generation and action localization are evaluated, and the new challenges posed by the dense temporal annotations are highlighted. Expand
AVA: A Video Dataset of Spatio-Temporally Localized Atomic Visual Actions
  • C. Gu, Chen Sun, +8 authors J. Malik
  • Computer Science
  • 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition
  • 2018
TLDR
The AVA dataset densely annotates 80 atomic visual actions in 437 15-minute video clips, where actions are localized in space and time, resulting in 1.59M action labels with multiple labels per person occurring frequently. Expand
The KIT whole-body human motion database
We present a large-scale whole-body human motion database consisting of captured raw motion data as well as the corresponding post-processed motions. This database serves as a key element for a wideExpand
Language2Pose: Natural Language Grounded Pose Forecasting
TLDR
This paper introduces a neural architecture called Joint Language-to-Pose (or JL2P), which learns a joint embedding of language and pose and evaluates the proposed model on a publicly available corpus of 3D pose data and human-annotated sentences. Expand
Every Moment Counts: Dense Detailed Labeling of Actions in Complex Videos
TLDR
A novel variant of long short-term memory deep networks is defined for modeling these temporal relations via multiple input and output connections and it is shown that this model improves action labeling accuracy and further enables deeper understanding tasks ranging from structured retrieval to action prediction. Expand
The THUMOS challenge on action recognition for videos "in the wild"
TLDR
The THUMOS benchmark is described in detail and an overview of data collection and annotation procedures are given, including a comprehensive empirical study evaluating the differences in action recognition between trimmed and untrimed videos, and how well methods trained on trimmed videos generalize to untrimmed videos. Expand
...
1
2
3
4
5
...