Robust and Interpretable Grounding of Spatial References with Relation Networks

  title={Robust and Interpretable Grounding of Spatial References with Relation Networks},
  author={Tsung-Yen Yang and Karthik Narasimham},
Learning representations of spatial references in natural language is a key challenge in tasks like autonomous navigation and robotic manipulation. Recent work has investigated various neural architectures for learning multi-modal representations for spatial concepts. However, the lack of explicit reasoning over entities makes such approaches vulnerable to noise in input text or state observations. In this paper, we develop effective models for understanding spatial references in text that are… 

A Hybrid Model of Classification and Generation for Spatial Relation Extraction

A novel hybrid model HMCGR is proposed that contains a generation and a classification model, while the former can generate those null-role relations and the latter can extract those non-null- role relations to complement each other.

Leveraging Language for Accelerated Learning of Tool Manipulation

This work investigates whether linguistic information about a tool can help control policies adapt faster to new tools for a given task and demonstrates that combining linguistic information and meta-learning accelerates tool learning in several manipulation tasks including pushing, lifting, sweeping, and hammering.

Learning to Execute Actions or Ask Clarification Questions

The Minecraft Corpus Dataset is extended by annotat-ing all builder utterances into eight types, including clarification questions, and a new builder agent model capable of determining when to ask or execute instructions is proposed.

StepGame: A New Benchmark for Robust Multi-Hop Spatial Reasoning in Texts

A new Question-Answering dataset called StepGame is presented for robust multi-step spatial reasoning in texts and a Tensor-Product based Memory-Augmented Neural Network (TP-MANN) specialized for spatial reasoning tasks is proposed.

Grounding ‘Grounding’ in NLP

This work investigates the gap between definitions of “grounding” in NLP and Cognitive Science, and presents ways to both create new tasks or repurpose existing ones to make advancements towards achieving a more complete sense of grounding.



Encoding Spatial Relations from Natural Language

This work presents a system capable of capturing the semantics of spatial relations such as behind, left of, etc from natural language, and demonstrates that internal representations are robust to meaning preserving transformations of descriptions and viewpoint invariance is an emergent property of the system.

Learning Interpretable Spatial Operations in a Rich 3D Blocks World

A new dataset that pairs complex 3D spatial operations to rich natural language descriptions that require complex spatial and pragmatic interpretations and a new neural architecture that achieves competitive results while automatically discovering an inventory of interpretable spatial operations are introduced.

Representation Learning for Grounded Spatial Reasoning

This work considers the task of spatial reasoning in a simulated environment, where an agent can act and receive rewards, and proposes a model that learns a representation of the world steered by instruction text that outperforms state-of-the-art approaches on several metrics.

TOUCHDOWN: Natural Language Navigation and Spatial Reasoning in Visual Street Environments

This work introduces the Touchdown task and dataset, where an agent must first follow navigation instructions in a Street View environment to a goal position, and then guess a location in its observed environment described in natural language to find a hidden object.

Grounding language in perception: A connectionist model of spatial terms and vegue quantifiers

A new connectionist model of spatial language based on real psycholinguistic data puts together various constraints on object knowledge and on object localisation in order to influence the comprehension of a range of linguistic terms, mirroring what participants do in experiments.

Disentangled Relational Representations for Explaining and Learning from Demonstration

This work proposes a method in which a learning agent utilizes the information bottleneck layer of a high-parameter variational neural model, with auxiliary loss terms, in order to ground abstract concepts such as spatial relations in a photorealistic synthetic environment.

Interactive Grounded Language Acquisition and Generalization in a 2D World

The proposed model significantly outperforms five comparison methods for interpreting zero-shot sentences and demonstrates human-interpretable intermediate outputs of the model in the appendix.

Stay on the Path: Instruction Fidelity in Vision-and-Language Navigation

This work highlights shortcomings of current metrics for the Room-to-Room dataset and proposes a new metric, Coverage weighted by Length Score (CLS), and shows that agents that receive rewards for instruction fidelity outperform agents that focus on goal completion.

Learning to Follow Navigational Directions

A system that learns to follow navigational natural language directions by learning by apprenticeship from routes through a map paired with English descriptions using a reinforcement learning algorithm, which grounds the meaning of spatial terms like above and south into geometric properties of paths.

Grounding spatial language in perception: an empirical and computational investigation.

The authors conclude that the structure of linguistic spatial categories can be partially explained in terms of independently motivated perceptual processes.