Learn More
centered on a large database, but in this case it is entirely of living organisms, the marine bivalves. Over 28,000 records of bivalve genera and subgenera from 322 locations around the world have now been compiled by these authors, giving a global record of some 854 genera and subgenera and 5132 species. No fossils are included in the database, but because(More)
This paper presents a model of word acquisition which learns from multimodal sensory input. Set in an information theoretic framework, the model acquires a lexicon by nding and statistically modeling consistent inter-modal structure. Learning is achieved from multimodal sensor data without any human annotation. An implementation of this model was able to(More)
Speaking using unconstrained natural language is an intuitive and flexible way for humans to interact with robots. Understanding this kind of linguistic input is challenging because diverse words and phrases must be mapped into structures that the robot can understand, and elements in those structures must be grounded in an uncertain environment. We present(More)
We present a visually-grounded language understanding model based on a study of how people verbally describe objects in scenes. The emphasis of the model is on the combination of individual word meanings to produce meanings for complex referring expressions. The model has been implemented, and it is able to understand a broad range of spatial referring(More)
A theoretical framework for grounding language is introduced that provides a computational path from sensing and motor action to words and speech acts. The approach combines concepts from semiotics and schema theory to develop a holistic approach to linguistic meaning. Schemas serve as structured beliefs that are grounded in an agent’s physical environment(More)
  • Deb Roy
  • Trends in cognitive sciences
  • 2005
We use words to communicate about things and kinds of things, their properties, relations and actions. Researchers are now creating robotic and simulated systems that ground language in machine perception and action, mirroring human abilities. A new kind of computational model is emerging from this work that bridges the symbolic realm of language with the(More)
Fuse is a situated spoken language understanding system that uses visual context to steer the interpretation of speech. Given a visual scene and a spoken description, the system finds the object in the scene that best fits the meaning of the description. To solve this task, Fuse performs speech recognition and visually-grounded language understanding.(More)
| Language is grounded in sensory-motor experience. Grounding connects concepts to the physical world enabling humans to acquire and use words and sentences in context. Currently most machines which process language are not grounded. Instead, semantic representations are abstract, pre-speci ed, and have meaning only when interpreted by humans. We are(More)
We present Tweet2Vec, a novel method for generating general-purpose vector representation of tweets. The model learns tweet embeddings using character-level CNN-LSTM encoder-decoder. We trained our model on 3 million, randomly selected English-language tweets. The model was evaluated using two methods: tweet semantic similarity and tweet sentiment(More)
Our long-term objective is to develop robots that engage in natural language-mediated cooperative tasks with humans. To support this goal, we are developing an amodal representation and associated processes which is called a grounded situation model (GSM). We are also developing a modular architecture in which the GSM resides in a centrally located module,(More)