• Corpus ID: 220424436

Auxiliary Tasks Speed Up Learning PointGoal Navigation

@inproceedings{Ye2020AuxiliaryTS,
  title={Auxiliary Tasks Speed Up Learning PointGoal Navigation},
  author={Joel Ye and Dhruv Batra and Erik Wijmans and Abhishek Das},
  booktitle={CoRL},
  year={2020}
}
PointGoal Navigation is an embodied task that requires agents to navigate to a specified point in an unseen environment. Wijmans et al. showed that this task is solvable but their method is computationally prohibitive, requiring 2.5 billion frames and 180 GPU-days. In this work, we develop a method to significantly increase sample and time efficiency in learning PointNav using self-supervised auxiliary tasks (e.g. predicting the action taken between two egocentric observations, predicting the… 
Environment Predictive Coding for Embodied Agents
TLDR
The environment predictive coding method is introduced, a selfsupervised approach to learn environment-level representations for embodied agents that outperforms the state-of-the-art on challenging tasks with only a limited budget of experience.
Towards Disturbance-Free Visual Mobile Manipulation
TLDR
This work develops a new disturbance-avoidance methodology at the heart of which is the auxiliary task of disturbance prediction, which greatly enhances sample efficiency and final performance by knowledge distillation of disturbance into the agent.
A Survey of Embodied AI: From Simulators to Research Tasks
TLDR
An encyclopedic survey of the three main research tasks in embodied AI – visual exploration, visual navigation and embodied question answering – covering the state-of-the-art approaches, evaluation metrics and datasets is surveyed.
Auxiliary Tasks and Exploration Enable ObjectGoal Navigation
TLDR
This work proposes that agents will act to simplify their visual inputs so as to smooth their RNN dynamics, and that auxiliary tasks reduce overfitting by minimizing effective RNN dimensionality; i.e. a performant ObjectNav agent that must maintain coherent plans over long horizons does so by learning smooth, low-dimensional recurrent dynamics.
Auxiliary Tasks and Exploration Enable ObjectNav
TLDR
This work proposes that agents will act to simplify their visual inputs so as to smooth their RNN dynamics, and that auxiliary tasks reduce overfitting by minimizing effective RNN dimensionality; i.e. a performant OBJECTNAV agent that must maintain coherent plans over long horizons does so by learning smooth, low-dimensional recurrent dynamics.
A Simple Structure For Building A Robust Model
TLDR
A simple architecture to build a model with a certain degree of robustness is proposed, which improves the robustness of the trained network by adding an adversarial sample detection network for cooperative training.
Embodied Navigation at the Art Gallery
TLDR
This paper builds and releases a new 3D space with unique characteristics: the one of a complete art museum, named ArtGallery3D (AG3D), which is ampler, richer in visual features, and provides very sparse occupancy information.
Habitat-Web: Learning Embodied Object-Search Strategies from Human Demonstrations at Scale
TLDR
A large-scale study of imitating human demonstrations on tasks that require a virtual robot to search for objects in new environments, using Habitat simulator running in a web browser to Amazon Mechanical Turk and virtual teleoperation data-collection infrastructure to answer the question – how does large- scale imitation learning (IL) compare to reinforcement learning (RL).
How to Train PointGoal Navigation Agents on a (Sample and Compute) Budget
TLDR
This paper conducts an extensive set of experiments and identifies and discusses a number of ostensibly minor but significant design choices -- the advantage estimation procedure, visual encoder architecture, and a seemingly minor hyper-parameter change that lead considerable and consistent improvements over the baselines present in Savva et al.
Offline Visual Representation Learning for Embodied Navigation
TLDR
While the benefits of pretraining sometimes diminish (or entirely disappear) with long finetuning schedules, OVRL’s performance gains continue to increase (not decrease) as the agent is trained for 2 billion frames of experience.
...
1
2
3
...

References

SHOWING 1-10 OF 55 REFERENCES
Self-Monitoring Navigation Agent via Auxiliary Progress Estimation
TLDR
A self-monitoring agent with two complementary components: (1) visual-textual co-grounding module to locate the instruction completed in the past, the instruction required for the next action, and the next moving direction from surrounding images and (2) progress monitor to ensure the grounded instruction correctly reflects the navigation progress.
Learning to Navigate in Complex Environments
TLDR
This work considers jointly learning the goal-driven reinforcement learning problem with auxiliary depth prediction and loop closure classification tasks and shows that data efficiency and task performance can be dramatically improved by relying on additional auxiliary tasks leveraging multimodal sensory inputs.
Mid-Level Visual Representations Improve Generalization and Sample Efficiency for Learning Visuomotor Policies
TLDR
This work finds that using a mid-level perception confers significant advantages over training end-to-end from scratch (i.e. not leveraging priors) in navigation-oriented tasks and develops an efficient max-coverage feature set that can be adopted in lieu of raw images.
Situational Fusion of Visual Representation for Visual Navigation
TLDR
This work proposes to train an agent to fuse a large set of visual representations that correspond to diverse visual perception abilities, and develops an action-level representation fusion scheme, which predicts an action candidate from each representation and adaptively consolidate these action candidates into the final action.
Semi-parametric Topological Memory for Navigation
TLDR
A new memory architecture for navigation in previously unseen environments, inspired by landmark-based navigation in animals, that consists of a (non-parametric) graph with nodes corresponding to locations in the environment and a deep network capable of retrieving nodes from the graph based on observations.
Neural Modular Control for Embodied Question Answering
TLDR
This work uses imitation learning to warm-start policies at each level of the hierarchy, dramatically increasing sample efficiency, followed by reinforcement learning, for learning policies for navigation over long planning horizons from language input.
SplitNet: Sim2Sim and Task2Task Transfer for Embodied Visual Navigation
We propose SplitNet, a method for decoupling visual perception and policy learning. By incorporating auxiliary tasks and selective learning of portions of the model, we explicitly decompose the
Vision-and-Language Navigation: Interpreting Visually-Grounded Navigation Instructions in Real Environments
TLDR
This work provides the first benchmark dataset for visually-grounded natural language navigation in real buildings - the Room-to-Room (R2R) dataset and presents the Matter-port3D Simulator - a large-scale reinforcement learning environment based on real imagery.
Embodied Question Answering in Photorealistic Environments With Point Cloud Perception
TLDR
It is found that point clouds provide a richer signal than RGB images for learning obstacle avoidance, motivating the use (and continued study) of 3D deep learning models for embodied navigation.
Taskonomy: Disentangling Task Transfer Learning
TLDR
This work proposes a fully computational approach for modeling the structure of space of visual tasks via finding (first and higher-order) transfer learning dependencies across a dictionary of twenty six 2D, 2.5D, 3D, and semantic tasks in a latent space and provides a computational taxonomic map for task transfer learning.
...
1
2
3
4
5
...