What do navigation agents learn about their environment?

  title={What do navigation agents learn about their environment?},
  author={Kshitij Dwivedi and Gemma Roig and Aniruddha Kembhavi and Roozbeh Mottaghi},
Today’s state of the art visual navigation agents typi-cally consist of large deep learning models trained end to end. Such models offer little to no interpretability about the learned skills or the actions of the agent taken in response to its environment. While past works have explored interpreting deep learning models, little attention has been devoted to interpreting embodied AI systems, which often involve reasoning about the structure of the environment, target characteristics and the… 

Figures from this paper


Learning Object Relation Graph and Tentative Policy for Visual Navigation
Three complementary techniques, object relation graph (ORG), trial-driven imitation learning (IL), and a memory-augmented tentative policy network (TPN), which improves visual representation learning by integrating object relationships, including category closeness and spatial correlations are proposed.
AllenAct: A Framework for Embodied AI Research
AllenAct is introduced, a modular and flexible learning framework designed with a focus on the unique requirements of Embodied AI research that provides first-class support for a growing collection of embodied environments, tasks and algorithms.
Visual Representations for Semantic Target Driven Navigation
This work proposes to use semantic segmentation and detection masks as observations obtained by state-of-the-art computer vision algorithms and use a deep network to learn navigation policies on top of representations that capture spatial layout and semantic contextual cues.
Target-driven visual navigation in indoor scenes using deep reinforcement learning
This paper proposes an actor-critic model whose policy is a function of the goal as well as the current state, which allows better generalization and proposes the AI2-THOR framework, which provides an environment with high-quality 3D scenes and a physics engine.
MultiON: Benchmarking Semantic Map Memory using Multi-Object Navigation
This work proposes the multiON task, which requires navigation to an episode-specific sequence of objects in a realistic environment and generalizes the ObjectGoal navigation task and explicitly tests the ability of navigation agents to locate previously observed goal objects.
Occupancy Anticipation for Efficient Exploration and Navigation
This work proposes occupancy anticipation, where the agent uses its egocentric RGB-D observations to infer the occupancy state beyond the visible regions, which facilitates efficient exploration and navigation in 3D environments.
ObjectNav Revisited: On Evaluation of Embodied Agents Navigating to Objects
This document summarizes the consensus recommendations of this working group on ObjectNav and makes recommendations on subtle but important details of evaluation criteria, the agent's embodiment parameters, and the characteristics of the environments within which the task is carried out.
Vision-and-Language Navigation: Interpreting Visually-Grounded Navigation Instructions in Real Environments
This work provides the first benchmark dataset for visually-grounded natural language navigation in real buildings - the Room-to-Room (R2R) dataset and presents the Matter-port3D Simulator - a large-scale reinforcement learning environment based on real imagery.
Visual Navigation with Spatial Attention
The attention model is shown to improve the agent’s policy and to achieve state-of-the-art results on commonly-used datasets.
ManipulaTHOR: A Framework for Visual Object Manipulation
This work proposes a framework for object manipulation built upon the physics-enabled, visually rich AI2-THOR framework and presents a new challenge to the Embodied AI community known as ArmPointNav, which extends the popular point navigation task to object manipulation and offers new challenges including 3D obstacle avoidance.