• Corpus ID: 2168019

Learning Articulated Motion Models from Visual and Lingual Signals

  title={Learning Articulated Motion Models from Visual and Lingual Signals},
  author={Zhengyang Wu and Mohit Bansal and Matthew R. Walter},
  journal={arXiv: Robotics},
In order for robots to operate effectively in homes and workplaces, they must be able to manipulate the articulated objects common within environments built for and by humans. Previous work learns kinematic models that prescribe this manipulation from visual demonstrations. Lingual signals, such as natural language descriptions and instructions, offer a complementary means of conveying knowledge of such manipulation models and are suitable to a wide range of interactions (e.g., remote… 

Figures and Tables from this paper


Learning Articulated Motions From Visual Demonstration
A method by which a robot can acquire an object model by capturing depth imagery of the object as a human moves it through its range of motion, and uses the model to predict the object's motion from a novel vantage point is described.
Learning models for following natural language directions in unknown environments
A novel learning framework is proposed that enables robots to successfully follow natural language route directions without any previous knowledge of the environment by learning and performing inference over a latent environment model.
A Probabilistic Framework for Learning Kinematic Models of Articulated Objects
This work presents a novel, probabilistic framework for modeling articulated objects as kinematic graphs, and demonstrates that this approach has a broad set of applications, in particular for the emerging fields of mobile manipulation and service robotics.
Understanding Natural Language Commands for Robotic Navigation and Mobile Manipulation
This paper describes a new model for understanding natural language commands given to autonomous systems that perform navigation and mobile manipulation in semi-structured environments that dynamically instantiates a probabilistic graphical model for a particular natural language command according to the command's hierarchical and compositional semantic structure.
Inferring Maps and Behaviors from Natural Language Instructions
This paper proposes a probabilistic framework that enables robots to follow commands given in natural language, without any prior knowledge of the environment, and demonstrates the algorithm’s ability to follow navigation commands with performance comparable to that of a fully-known environment.
Learning Semantic Maps from Natural Language Descriptions
An algorithm that enables robots to efficiently learn human-centric models of their environment from natural language descriptions and increases the metric, topological and semantic accuracy of the recovered environment model is proposed.
Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models
This work introduces the structure-content neural language model that disentangles the structure of a sentence to its content, conditioned on representations produced by the encoder, and shows that with linear encoders, the learned embedding space captures multimodal regularities in terms of vector space arithmetic.
Listen, Attend, and Walk: Neural Mapping of Navigational Instructions to Action Sequences
This work introduces a multi-level aligner that empowers the alignment-based encoder-decoder model with long short-term memory recurrent neural networks (LSTM-RNN) to translate natural language instructions to action sequences based upon a representation of the observable world state.
Show and tell: A neural image caption generator
This paper presents a generative model based on a deep recurrent architecture that combines recent advances in computer vision and machine translation and that can be used to generate natural sentences describing an image.
Weakly Supervised Learning of Semantic Parsers for Mapping Instructions to Actions
This paper shows semantic parsing can be used within a grounded CCG semantic parsing approach that learns a joint model of meaning and context for interpreting and executing natural language instructions, using various types of weak supervision.