Corpus ID: 14417013

A system that learns to describe objects in visual scenes

  title={A system that learns to describe objects in visual scenes},
  author={D. Roy},
  • D. Roy
  • Published in INTERSPEECH 2002
  • Computer Science
A spoken language generation system has been developed that learns to describe objects in computer-generated visual scenes. The system is trained by a ‘show-and-tell’ procedure in which visual scenes are paired with natural language descriptions. A set of learning algorithms acquire probabilistic structures which encode the visual semantics of phrase structure, word classes, and individual words. Using these structures, a planning algorithm integrates syntactic, semantic, and contextual… Expand


Learning visually grounded words and syntax for a scene description task
  • D. Roy
  • Computer Science
  • Comput. Speech Lang.
  • 2002
A spoken language generation system that learns to describe objects in computer-generated visual scenes and generates syntactically well-formed compound adjective noun phrases, as well as relative spatial clauses was comparable to human-generated descriptions. Expand
Learning visually grounded words and syntax of natural spoken language
This paper advocates the creation of physically grounded language learning machines as a path toward scalable systems which can conceptualize and communicate about the world in human-like ways. Expand
Grounded spoken language acquisition: experiments in word learning
  • D. Roy
  • Computer Science
  • IEEE Trans. Multim.
  • 2003
Inspired by theories of infant cognition, this work presents a computational model which learns words from untranscribed acoustic and video input which is implemented in a real-time robotic system which performs interactive language learning and understanding. Expand
VIsual TRAnslator: Linking perceptions and natural language descriptions
Practical experience gained in the projectVitra concerning the design and construction of integrated knowledge-based systems capable of translating visual information into natural language descriptions is reported on. Expand
Learning Attribute Selections for Non-Pronominal Expressions
This paper reports results from using machine learning to train and test a nominal-expression generator on a set of 393 nominal descriptions from the COCONUT corpus of task-oriented design dialogues. Expand
Generating coherent presentations employing textual and visual material
This work first shows that multimedia presentations and pure text follow similar structuring principles, and sketches how techniques for planning text and discourse can be generalized to allow the structure and contents of multimedia communications to be planned as well. Expand
Generating referring expressions - constructing descriptions in a domain of objects and processes
  • R. Dale
  • Computer Science
  • ACL-MIT press series in natural language processing
  • 1992
Part 1 Introduction: what this book is about the phenomena considered the aims of the work starting points an overview of the system structure of the book. Part 2 The representation of entities: someExpand