Corpus ID: 237490775

MovieCuts: A New Dataset and Benchmark for Cut Type Recognition

@article{Pardo2021MovieCutsAN,
  title={MovieCuts: A New Dataset and Benchmark for Cut Type Recognition},
  author={A. Pardo and Fabian Caba Heilbron and Juan Le'on Alc'azar and Ali K. Thabet and Bernard Ghanem},
  journal={ArXiv},
  year={2021},
  volume={abs/2109.05569}
}
Understanding movies and their structural patterns is a crucial task to decode the craft of video editing. While previous works have developed tools for general analysis such as detecting characters or recognizing cinematography properties at the shot level, less effort has been devoted to understanding the most basic video edit, the Cut. This paper introduces the cut type recognition task, which requires modeling of multi-modal information. To ignite research in the new task, we construct a… Expand

References

SHOWING 1-10 OF 61 REFERENCES
A Unified Framework for Shot Type Classification Based on Subject Centric Lens
TLDR
A learning framework Subject Guidance Network (SGNet) for shot type recognition is proposed, which separates the subject and background of a shot into two streams, serving as separate guidance maps for scale and movement type classification respectively. Expand
Ridiculously Fast Shot Boundary Detection with Fully Convolutional Neural Networks
  • Michael Gygli
  • Computer Science
  • 2018 International Conference on Content-Based Multimedia Indexing (CBMI)
  • 2018
TLDR
This work proposes a Convolutional Neural Network (CNN) which is fully convolutional in time, thus allowing to use a large temporal context without the need to repeatedly processing frames. Expand
MovieNet: A Holistic Dataset for Movie Understanding
TLDR
MovieNet is the largest dataset with richest annotations for comprehensive movie understanding and it is believed that such a holistic dataset would promote the researches on story-based long video understanding and beyond. Expand
Classifying cinematographic shot types
TLDR
This work investigates five different inherent characteristics of single shots which contain indirect information about camera distance, without the need to recover the 3D structure of the scene. Expand
Learning realistic human actions from movies
TLDR
A new method for video classification that builds upon and extends several recent ideas including local space-time features,space-time pyramids and multi-channel non-linear SVMs is presented and shown to improve state-of-the-art results on the standard KTH action dataset. Expand
A Local-to-Global Approach to Multi-Modal Movie Scene Segmentation
  • Anyi Rao, Linning Xu, +4 authors Dahua Lin
  • Computer Science
  • 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
  • 2020
TLDR
This work builds a large-scale video dataset MovieScenes, which contains 21K annotated scene segments from 150 movies, and proposes a local-to-global scene segmentation framework, which integrates multi-modal information across three levels, i.e. clip, segment, and movie. Expand
Human Mesh Recovery from Multiple Shots
TLDR
An insight that while shot changes of the same scene incur a discontinuity between frames, the 3D structure of the scene still changes smoothly is addressed, which allows us to handle frames before and after the shot change as multi-view signal that provide strong cues to recover the3D state of the actors. Expand
A Dataset and Exploration of Models for Understanding Video Data through Fill-in-the-Blank Question-Answering
TLDR
This task is not solvable by a language model alone, and the model combining 2D and 3D visual information indeed provides the best result, all models perform significantly worse than human-level. Expand
Person Search in Videos with One Portrait Through Visual and Temporal Links
TLDR
A novel framework is proposed, which takes into account the identity invariance along a tracklet, thus allowing person identities to be propagated via both the visual and the temporal links and remarkably outperforms mainstream person re-id methods. Expand
“Who are you?” - Learning person specific classifiers from video
TLDR
A character specific multiple kernel classifier which is able to learn the features best able to discriminate between the characters is reported, demonstrating significantly increased coverage and performance with respect to previous methods on this material. Expand
...
1
2
3
4
5
...