Corpus ID: 195791482

Chasing Ghosts: Instruction Following as Bayesian State Tracking

@inproceedings{Anderson2019ChasingGI,
  title={Chasing Ghosts: Instruction Following as Bayesian State Tracking},
  author={Peter Anderson and Ayush Shrivastava and Devi Parikh and Dhruv Batra and Stefan Lee},
  booktitle={NeurIPS},
  year={2019}
}
A visually-grounded navigation instruction can be interpreted as a sequence of expected observations and actions an agent following the correct trajectory would encounter and perform. [...] Key Method Together with a mapper that constructs a semantic spatial map on-the-fly during navigation, we formulate an end-to-end differentiable Bayes filter and train it to identify the goal by predicting the most likely trajectory through the map according to the instructions.Expand
Rethinking the Spatial Route Prior in Vision-and-Language Navigation
  • Xinzhe Zhou, Wei Liu, Yadong Mu
  • Computer Science
  • ArXiv
  • 2021
TLDR
This work addresses the task of VLN from a previouslyignored aspect, namely the spatial route prior of the navigation scenes, and proposes a sequential-decision variant and an exploreand-exploit scheme that curates a compact and informative sub-graph to exploit. Expand
Beyond the Nav-Graph: Vision-and-Language Navigation in Continuous Environments
TLDR
A language-guided navigation task set in a continuous 3D environment where agents must execute low-level actions to follow natural language navigation directions is developed, suggesting that performance in prior `navigation-graph' settings may be inflated by the strong implicit assumptions. Expand
SGoLAM: Simultaneous Goal Localization and Mapping for Multi-Object Goal Navigation
  • Junho Kim, Eun Sun Lee, Mingi Lee, Donsu Zhang, Young Min Kim
  • Computer Science
  • ArXiv
  • 2021
TLDR
This work presents SGoLAM, short for simultaneous goal localization and mapping, which is a simple and efficient algorithm for Multi-Object Goal navigation, which fully leverages the strength of classical approaches for visual navigation, by decomposing the problem into two key components: mapping and goal localization. Expand
Evolving Graphical Planner: Contextual Global Planning for Vision-and-Language Navigation
TLDR
The Evolving Graphical Planner (EGP) is introduced, a model that performs global planning for navigation based on raw sensory input that dynamically constructs a graphical representation, generalizes the action space to allow for more flexible decision making, and performs efficient planning on a proxy graph representation. Expand
Diagnosing the Environment Bias in Vision-and-Language Navigation
TLDR
This work designs novel diagnosis experiments via environment re-splitting and feature replacement, looking into possible reasons for this environment bias in VLN models, and explores several kinds of semantic representations that contain less low-level visual information. Expand
SOON: Scenario Oriented Object Navigation with Graph-based Exploration
TLDR
A novel graph-based exploration (GBE) method is proposed that outperforms various state-of-the-arts on both FAO and R2R datasets and the ablation studies on FAO validates the quality of the dataset. Expand
Topological Planning with Transformers for Vision-and-Language Navigation
TLDR
This work proposes a modular approach to VLN using topological maps that leverages attention mechanisms to predict a navigation plan in the map, and generates interpretable navigation plans, and exhibits intelligent behaviors such as backtracking. Expand
Visual Navigation with Spatial Attention
TLDR
The attention model is shown to improve the agent’s policy and to achieve state-of-the-art results on commonly-used datasets. Expand
Scene-Intuitive Agent for Remote Embodied Visual Grounding
TLDR
This paper focuses on the Remote Embodied Visual Referring Expression in Real Indoor Environments task, called REVERIE, where an agent is asked to correctly localize a remote target object specified by a concise high-level natural language instruction, and proposes a two-stage training pipeline. Expand
Few-shot Object Grounding and Mapping for Natural Language Robot Instruction Following
TLDR
This work introduces a few-shot language-conditioned object grounding method trained from augmented reality data that uses exemplars to identify objects and align them to their mentions in instructions, and presents a learned map representation that encodes object locations and their instructed use. Expand
...
1
2
3
...

References

SHOWING 1-10 OF 45 REFERENCES
Cognitive Mapping and Planning for Visual Navigation
TLDR
The Cognitive Mapper and Planner is based on a unified joint architecture for mapping and planning, such that the mapping is driven by the needs of the task, and a spatial memory with the ability to plan given an incomplete set of observations about the world. Expand
Self-Monitoring Navigation Agent via Auxiliary Progress Estimation
TLDR
A self-monitoring agent with two complementary components: (1) visual-textual co-grounding module to locate the instruction completed in the past, the instruction required for the next action, and the next moving direction from surrounding images and (2) progress monitor to ensure the grounded instruction correctly reflects the navigation progress. Expand
The Regretful Agent: Heuristic-Aided Navigation Through Progress Estimation
TLDR
This paper proposes to use a progress monitor developed in prior work as a learnable heuristic for search, and proposes two modules incorporated into an end-to-end architecture that significantly outperforms current state-of-the-art methods using greedy action selection. Expand
Tactical Rewind: Self-Correction via Backtracking in Vision-And-Language Navigation
We present the Frontier Aware Search with backTracking (FAST) Navigator, a general framework for action decoding, that achieves state-of-the-art results on the 2018 Room-to-Room (R2R)Expand
Mapping Navigation Instructions to Continuous Control Actions with Position-Visitation Prediction
TLDR
This model predicts interpretable position-visitation distributions indicating where the agent should go during execution and where it should stop, and uses the predicted distributions to select the actions to execute to allow for simple and efficient training using a combination of supervised learning and imitation learning. Expand
Semi-parametric Topological Memory for Navigation
TLDR
A new memory architecture for navigation in previously unseen environments, inspired by landmark-based navigation in animals, that consists of a (non-parametric) graph with nodes corresponding to locations in the environment and a deep network capable of retrieving nodes from the graph based on observations. Expand
Vision-and-Language Navigation: Interpreting Visually-Grounded Navigation Instructions in Real Environments
TLDR
This work provides the first benchmark dataset for visually-grounded natural language navigation in real buildings - the Room-to-Room (R2R) dataset and presents the Matter-port3D Simulator - a large-scale reinforcement learning environment based on real imagery. Expand
Learning to Navigate Unseen Environments: Back Translation with Environmental Dropout
TLDR
This paper presents a generalizable navigational agent, trained in two stages via mixed imitation and reinforcement learning, outperforming the state-of-art approaches by a large margin on the private unseen test set of the Room-to-Room task, and achieving the top rank on the leaderboard. Expand
Differentiable Particle Filters: End-to-End Learning with Algorithmic Priors
TLDR
This work presents differentiable particle filters (DPFs), a differentiable implementation of the particle filter algorithm with learnable motion and measurement models that encode the structure of recursive state estimation with prediction and measurement update that operate on a probability distribution over states. Expand
Reinforced Cross-Modal Matching and Self-Supervised Imitation Learning for Vision-Language Navigation
TLDR
A novel Reinforced Cross-Modal Matching (RCM) approach that enforces cross-modal grounding both locally and globally via reinforcement learning (RL), and a Self-Supervised Imitation Learning (SIL) method to explore unseen environments by imitating its own past, good decisions is introduced. Expand
...
1
2
3
4
5
...