AViD Dataset: Anonymized Videos from Diverse Countries
@article{Piergiovanni2020AViDDA, title={AViD Dataset: Anonymized Videos from Diverse Countries}, author={A. J. Piergiovanni and Michael S. Ryoo}, journal={ArXiv}, year={2020}, volume={abs/2007.05515} }
We introduce a new public video dataset for action recognition: Anonymized Videos from Diverse countries (AViD). Unlike existing public video datasets, AViD is a collection of action videos from many different countries. The motivation is to create a public dataset that would benefit training and pretraining of action recognition models for everybody, rather than making it useful for limited countries. Further, all the face identities in the AViD videos are properly anonymized to protect their…
Figures and Tables from this paper
21 Citations
A Study of Face Obfuscation in ImageNet
- Computer ScienceICML
- 2022
The effects of face obfuscation on the popular ImageNet challenge visual recognition benchmark is explored, the feasibil-ity of privacy-aware visual recognition is demonstrated, the highly-used Image net challenge benchmark is improved, and an important path for future visual datasets is suggested.
A Comprehensive Study of Deep Video Action Recognition
- Computer ScienceArXiv
- 2020
A comprehensive survey of over 200 existing papers on deep learning for video action recognition is provided, starting with early attempts at adapting deep learning, then to the two-stream networks, followed by the adoption of 3D convolutional kernels, and finally to the recent compute-efficient models.
Video Action Understanding: A Tutorial
- Computer ScienceArXiv
- 2020
This tutorial clarifies a taxonomy of video action problems, highlights datasets and metrics used to baseline each problem, describes common data preparation methods, and presents the building blocks of state-of-the-art deep learning model architectures.
Activity Graph Transformer for Temporal Action Localization
- Computer ScienceArXiv
- 2021
An end-to-end learnable model for temporal action localization, that receives a video as input and directly predicts a set of action instances that appear in the video, that outperforms the state-of-the-art by a considerable margin.
A Review of Deep Learning for Video Captioning
- Computer ScienceArXiv
- 2023
This survey covers deep learning-based VC, including but, not limited to, attention-based architectures, graph networks, reinforcement learning, adversarial networks, dense video captioning (DVC), and more.
A Comprehensive Review of Recent Deep Learning Techniques for Human Activity Recognition
- Computer ScienceComputational intelligence and neuroscience
- 2022
This survey provides recent convolution-free-based methods which replaced convolution networks with the transformer networks that achieved state-of-the-art results on many human action recognition datasets.
A review of video action recognition based on 3D convolution
- Computers and Electrical Engineering
- 2023
Ethical Considerations for Collecting Human-Centric Image Datasets
- Computer ScienceArXiv
- 2023
The research directly addresses issues of privacy and bias by contributing to the research community best practices for ethical data collection, covering purpose, privacy and consent, as well as diversity.
Baseline Method for the Sport Task of MediaEval 2022 with 3D CNNs using Attention Mechanisms
- Computer ScienceArXiv
- 2023
This paper presents the baseline method proposed for the Sports Video task part of the MediaEval 2022 benchmark. This task proposes two subtasks: stroke classification from trimmed videos, and stroke…
Fine-grained Activities of People Worldwide
- Computer Science2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)
- 2023
Collect, a free mobile app to record video while simultaneously annotating objects and activities of consented subjects, and provides activity classification and activity detection benchmarks for this dataset, and analyzes baseline results to gain insight into how people around with world perform common activities.
32 References
Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset
- Computer Science2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
- 2017
I3D models considerably improve upon the state-of-the-art in action classification, reaching 80.2% on HMDB-51 and 97.9% on UCF-101 after pre-training on Kinetics, and a new Two-Stream Inflated 3D Conv net that is based on 2D ConvNet inflation is introduced.
C3D: Generic Features for Video Analysis
- Computer ScienceArXiv
- 2014
Convolution 3D feature is proposed, a generic spatio-temporal feature obtained by training a deep 3-dimensional convolutional network on a large annotated video dataset comprising objects, scenes, actions, and other frequently occurring concepts that encapsulate appearance and motion cues and perform well on different video classification tasks.
YouTube-8M: A Large-Scale Video Classification Benchmark
- Computer ScienceArXiv
- 2016
YouTube-8M is introduced, the largest multi-label video classification dataset, composed of ~8 million videos (500K hours of video), annotated with a vocabulary of 4800 visual entities, and various (modest) classification models are trained on the dataset.
Hollywood in Homes: Crowdsourcing Data Collection for Activity Understanding
- Computer ScienceECCV
- 2016
This work proposes a novel Hollywood in Homes approach to collect data, collecting a new dataset, Charades, with hundreds of people recording videos in their own homes, acting out casual everyday activities, and evaluates and provides baseline results for several tasks including action recognition and automatic description generation.
Two-Stream Convolutional Networks for Action Recognition in Videos
- Computer ScienceNIPS
- 2014
This work proposes a two-stream ConvNet architecture which incorporates spatial and temporal networks and demonstrates that a ConvNet trained on multi-frame dense optical flow is able to achieve very good performance in spite of limited training data.
HowTo100M: Learning a Text-Video Embedding by Watching Hundred Million Narrated Video Clips
- Computer Science2019 IEEE/CVF International Conference on Computer Vision (ICCV)
- 2019
It is demonstrated that a text-video embedding trained on this data leads to state-of-the-art results for text-to-video retrieval and action localization on instructional video datasets such as YouCook2 or CrossTask.
HMDB: A large video database for human motion recognition
- Computer Science2011 International Conference on Computer Vision
- 2011
This paper uses the largest action video database to-date with 51 action categories, which in total contain around 7,000 manually annotated clips extracted from a variety of sources ranging from digitized movies to YouTube, to evaluate the performance of two representative computer vision systems for action recognition and explore the robustness of these methods under various conditions.
HACS: Human Action Clips and Segments Dataset for Recognition and Temporal Localization
- Computer Science2019 IEEE/CVF International Conference on Computer Vision (ICCV)
- 2019
On HACS Segments, the state-of-the-art methods of action proposal generation and action localization are evaluated, and the new challenges posed by the dense temporal annotations are highlighted.
The Kinetics Human Action Video Dataset
- Computer ScienceArXiv
- 2017
The dataset is described, the statistics are described, how it was collected, and some baseline performance figures for neural network architectures trained and tested for human action classification on this dataset are given.
AVA: A Video Dataset of Spatio-Temporally Localized Atomic Visual Actions
- Computer Science2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition
- 2018
The AVA dataset densely annotates 80 atomic visual actions in 437 15-minute video clips, where actions are localized in space and time, resulting in 1.59M action labels with multiple labels per person occurring frequently.