• Corpus ID: 173991156

Visual Understanding and Narration: A Deeper Understanding and Explanation of Visual Scenes

@article{Lukin2019VisualUA,
  title={Visual Understanding and Narration: A Deeper Understanding and Explanation of Visual Scenes},
  author={Stephanie M. Lukin and Claire Bonial and Clare R. Voss},
  journal={ArXiv},
  year={2019},
  volume={abs/1906.00038}
}
We describe the task of Visual Understanding and Narration, in which a robot (or agent) generates text for the images that it collects when navigating its environment, by answering open-ended questions, such as 'what happens, or might have happened, here?' 

Figures from this paper

References

SHOWING 1-10 OF 45 REFERENCES

A Pipeline for Creative Visual Storytelling

This paper presents a pipeline of task-modules, Object Identification, Single-Image Inferencing, and Multi-Image Narration, that serve as a preliminary design for building a creative visual storyteller and piloted this design for a sequence of images in an annotation task.

DramaBank: Annotating Agency in Narrative Discourse

A collection project, DramaBank, which includes encodings of texts ranging from small fables to epic poetry and contemporary nonfiction, that is amenable to corpus annotation.

Visual Storytelling

Modelling concrete description as well as figurative and social language, as provided in this dataset and the storytelling task, has the potential to move artificial intelligence from basic understandings of typical visual scenes towards more and more human-like understanding of grounded event structure and subjective expression.

SNAG: Spoken Narratives and Gaze Dataset

A new multimodal dataset that consists of gaze measurements and spoken descriptions collected in parallel during an image inspection task is described and the usefulness of the dataset is demonstrated by applying an existing visual-linguistic data fusion framework in order to label important image regions with appropriate linguistic labels.

Learning knowledge to support domain-independent narrative intelligence

This dissertation follows the three-tier model and proposes methods to generate fabula, sjuzhet, and text respectively, which are derived from artificially segmented perceptual inputs of the human senses.

Evaluation, Orientation, and Action in Interactive StoryTelling

It is shown that it can automatically distinguish EVALUATION clauses from ORIENTATION and ACTION clauses with 89% accuracy in fables, suggesting that it will be possible to develop new types of data-driven stories using L&W’s typology.

Narratological Knowledge for Natural Language Generation

The paper proposes an architecture for advanced NLG systems that handle narratives, and Domain modelling and meta-knowledge modelling for a narratological structurer are exemplified.

Plot Units and Narrative Summarization

A technique of memory representation based on plot units appears to provide a rich foundation for a high level analysis of the story that highlights its central concepts.

Cooperating with Avatars Through Gesture, Language and Action

This paper explores peer-to-peer communication between people and machines in the context of blocks world, which serves as a surrogate for cooperative tasks where the partners share a workspace and finds that ambiguities flip the conversational lead.

On the Representation of Inferences and their Lexicalization

A more effective and efficient way to marshal inferences from background knowledge to facilitate deep natural language understanding is developed and implemented on real text.