Corpus ID: 233004476

Visual Semantic Role Labeling for Video Understanding

@article{Sadhu2021VisualSR,
  title={Visual Semantic Role Labeling for Video Understanding},
  author={Arka Sadhu and Tanmay Gupta and Mark Yatskar and R. Nevatia and Aniruddha Kembhavi},
  journal={ArXiv},
  year={2021},
  volume={abs/2104.00990}
}
We propose a new framework for understanding and representing related salient events in a video using visual semantic role labeling. We represent videos as a set of related events, wherein each event consists of a verb and multiple entities that fulfill various roles relevant to that event. To study the challenging task of semantic role labeling in videos or VidSRL, we introduce the VidSitu benchmark, a large scale video understanding data source with 29K 10-second movie clips richly annotated… Expand

References

SHOWING 1-10 OF 92 REFERENCES
Dense-Captioning Events in Videos
Visual Semantic Role Labeling
Large Scale Holistic Video Understanding
Grounding Semantic Roles in Images
AVA: A Video Dataset of Spatio-Temporally Localized Atomic Visual Actions
  • C. Gu, C. Sun, +8 authors J. Malik
  • Computer Science
  • 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition
  • 2018
MovieGraphs: Towards Understanding Human-Centric Situations from Videos
Knowledge Graph Extraction from Videos
Localizing Moments in Video with Natural Language
A dataset for Movie Description
...
1
2
3
4
5
...