• Corpus ID: 9014123

Learning Multi-Modal Grounded Linguistic Semantics by Playing "I Spy"

@inproceedings{Thomason2016LearningMG,
  title={Learning Multi-Modal Grounded Linguistic Semantics by Playing "I Spy"},
  author={Jesse Thomason and Jivko Sinapov and Maxwell Svetlik and Peter Stone and Raymond J. Mooney},
  booktitle={IJCAI},
  year={2016}
}
Grounded language learning bridges words like 'red' and 'square' with robot perception. The vast majority of existing work in this space limits robot perception to vision. In this paper, we build perceptual models that use haptic, auditory, and proprioceptive data acquired through robot exploratory behaviors to go beyond vision. Our system learns to ground natural language words describing objects using supervision from an interactive humanrobot "I Spy" game. In this game, the human and robot… 
Guiding Exploratory Behaviors for Multi-Modal Grounding of Linguistic Descriptions
TLDR
This paper proposes a method for guiding a robot's behavioral exploration policy when learning a novel predicate based on known grounded predicates and the novel predicate’s linguistic relationship to them and demonstrates the approach on two datasets.
Guiding Interaction Behaviors for Multi-modal Grounded Language Learning
TLDR
This work gathers behavior annotations from humans and demonstrates that these improve language grounding performance by allowing a system to focus on relevant behaviors for words like “white” or “half-full” that can be understood by looking or lifting, respectively.
Interactive Learning of Grounded Verb Semantics towards Human-Robot Communication
TLDR
A new interactive learning approach is presented that allows robots to proactively engage in interaction with human partners by asking good questions to learn models for grounded verb semantics and uses reinforcement learning to allow the robot to acquire an optimal policy for its question-asking behaviors by maximizing the long-term reward.
Jointly Improving Parsing and Perception for Natural Language Commands through Human-Robot Dialog
TLDR
Methods for using human-robot dialog to improve language understanding for a mobile robot agent that parses natural language to underlying semantic meanings and uses robotic sensors to create multi-modal models of perceptual concepts like red and heavy are presented.
Sensorimotor Cross-Behavior Knowledge Transfer for Grounded Category Recognition
TLDR
The results show that the proposed framework can enable a target robot to perform category recognition on a set of novel objects and categories without the need to physically interact with the objects to learn the categorization model.
Improving Grounded Natural Language Understanding through Human-Robot Dialog
TLDR
This work presents an end-to-end pipeline for translating natural language commands to discrete robot actions, and uses clarification dialogs to jointly improve language parsing and concept grounding.
On the Multisensory Nature of Objects and Language: A Robotics Perspective
  • J. Sinapov
  • Psychology
    1st International Workshop on Multimodal Understanding and Learning for Embodied Applications - MULEA '19
  • 2019
TLDR
Results from several large-scale experimental studies are highlighted which show that the behavior-grounded object representation enables a robot to solve a wide variety of perceptual and cognitive tasks relevant to object learning.
Object Affordance Learning from Human Descriptions Human Robot Interaction Final Project Report
TLDR
A robot will ask the human to describe objects in terms of their properties and functionalities and build object models using the feedback and implement a prototype of the proposed system and present the results of an user study in this report.
A Generalized Model for Multimodal Perception
TLDR
A Conditional Random Fields (CRF) based approach to fuse visual and verbal modalities where n-ary relations (or descriptions) as factor functions is developed and it is hypothesized that human descriptions of an environment will improve robot's recognition if the information can be properly fused.
Grounding Symbols in Multi-Modal Instructions
TLDR
A method for processing a raw stream of cross-modal input to produce the segmentation of objects with a correspondent association to high-level concepts and results show the model learns the user’s notion of colour and shape from a small number of physical demonstrations.
...
...

References

SHOWING 1-10 OF 50 REFERENCES
A probabilistic approach to learning a visually grounded language model through human-robot interaction
  • H. Dindo, D. Zambuto
  • Computer Science
    2010 IEEE/RSJ International Conference on Intelligent Robots and Systems
  • 2010
TLDR
A novel probabilistic model is presented, inspired by the findings in cognitive sciences, able to associate spoken words with their perceptually grounded meanings, and it enables a robotic platform to learn grounded meanings of adjective/noun terms.
Learning Visually Grounded Words and Syntax of Natural Spoken Language
TLDR
This paper advocates the creation of physically grounded language learning machines as a path toward scalable systems which can conceptualize and communicate about the world in human-like ways.
Learning visually grounded words and syntax of natural spoken language
TLDR
This paper advocates the creation of physically grounded language learning machines as a path toward scalable systems which can conceptualize and communicate about the world in human-like ways.
Grounding the Meaning of Words through Vision and Interactive Gameplay
TLDR
I Spy is an effective approach for teaching robots how to model new concepts using representations comprised of visual attributes, and a model evaluation showed that the system correctly understood the visual representations of its learned concepts with an average of 65% accuracy.
Toward Interactive Grounded Language Acqusition
TLDR
This paper extends Logical Semantics with Perception to incorporate determiners (e.g., “the”) into its training procedure, enabling the model to generate acceptable relational language 20% more often than the unaugmented model.
Grounding semantic categories in behavioral interactions: Experiments with 100 objects
SALL-E: Situated Agent for Language Learning
TLDR
The method of retrieving object examples with a k-nearest neighbor classifier using Mahalanobis distance corresponds to a cognitively plausible representation of objects, and initial results show promise for achieving rapid, near one-shot, incremental learning of word meanings.
A Joint Model of Language and Perception for Grounded Attribute Learning
TLDR
This work presents an approach for joint learning of language and perception models for grounded attribute induction, which includes a language model based on a probabilistic categorial grammar that enables the construction of compositional meaning representations.
Jointly Learning to Parse and Perceive: Connecting Natural Language to the Physical World
TLDR
This paper introduces Logical Semantics with Perception (LSP), a model for grounded language acquisition that learns to map natural language statements to their referents in a physical environment and finds that LSP outperforms existing, less expressive models that cannot represent relational language.
Eye Spy: Improving Vision through Dialog
TLDR
A robotic dialog system to learn names and attributes of objects through spoken interaction with a human teacher, and a variant of the children’s games “I Spy” and “20 Questions” is reported.
...
...