Corpus ID: 231632397

Narration Generation for Cartoon Videos

  title={Narration Generation for Cartoon Videos},
  author={Nikos Papasarantopoulos and Shay B. Cohen},
Research on text generation from multimodal inputs has largely focused on static images, and less on video data. In this paper, we propose a new task, narration generation, that is complementing videos with narration texts that are to be interjected in several places. The narrations are part of the video and contribute to the storyline unfolding in it. Moreover, they are context-informed, since they include information appropriate for the timeframe of video they cover, and also, do not need to… Expand


Using Descriptive Video Services to Create a Large Data Source for Video Annotation Research
An automatic DVS segmentation and alignment method for movies is described, that enables us to scale up the collection of a DVS-derived dataset with minimal human intervention. Expand
Grounding Action Descriptions in Videos
A general purpose corpus is presented that aligns high quality videos with multiple natural language descriptions of the actions portrayed in the videos, together with an annotation of how similar the action descriptions are to each other. Expand
A dataset for Movie Description
Comparing ADs to scripts, it is found that ADs are far more visual and describe precisely what is shown rather than what should happen according to the scripts created prior to movie production. Expand
Get To The Point: Summarization with Pointer-Generator Networks
A novel architecture that augments the standard sequence-to-sequence attentional model in two orthogonal ways, using a hybrid pointer-generator network that can copy words from the source text via pointing, which aids accurate reproduction of information, while retaining the ability to produce novel words through the generator. Expand
MovieQA: Understanding Stories in Movies through Question-Answering
The MovieQA dataset, which aims to evaluate automatic story comprehension from both video and text, is introduced and existing QA techniques are extended to show that question-answering with such open-ended semantics is hard. Expand
Video Question Answering via Gradually Refined Attention over Appearance and Motion
This paper proposes an end-to-end model which gradually refines its attention over the appearance and motion features of the video using the question as guidance and demonstrates the effectiveness of the model by analyzing the refined attention weights during the question answering procedure. Expand
Generating Natural Questions About an Image
This paper introduces the novel task of Visual Question Generation, where the system is tasked with asking a natural and engaging question when shown an image, and provides three datasets which cover a variety of images from object-centric to event-centric. Expand
Coherent Multi-sentence Video Description with Variable Level of Detail
This paper follows a two-step approach where it first learns to predict a semantic representation from video and then generates natural language descriptions from it, and model across-sentence consistency at the level of the SR by enforcing a consistent topic. Expand
Automated Story Selection for Color Commentary in Sports
It is shown that commentary using SCoReS adds significantly to the broadcast across several enjoyment metrics and is a step toward automating sports commentary and, thus, automating narrative. Expand
What’s This Movie About? A Joint Neural Network Architecture for Movie Content Analysis
This work presents a novel end-to-end model for overview generation, consisting of a multi-label encoder for identifying screenplay attributes, and an LSTM decoder to generate natural language sentences conditioned on the identified attributes. Expand