SplitNet: Sim2Sim and Task2Task Transfer for Embodied Visual Navigation

@article{Gordon2019SplitNetSA,
  title={SplitNet: Sim2Sim and Task2Task Transfer for Embodied Visual Navigation},
  author={Daniel Gordon and Abhishek Kadian and Devi Parikh and Judy Hoffman and Dhruv Batra},
  journal={2019 IEEE/CVF International Conference on Computer Vision (ICCV)},
  year={2019},
  pages={1022-1031}
}
We propose SplitNet, a method for decoupling visual perception and policy learning. By incorporating auxiliary tasks and selective learning of portions of the model, we explicitly decompose the learning objectives for visual navigation into perceiving the world and acting on that perception. We show improvements over baseline models on transferring between simulators, an encouraging step towards Sim2Real. Additionally, SplitNet generalizes better to unseen environments from the same simulator… Expand
Environment Predictive Coding for Embodied Agents
TLDR
The environment predictive coding method is introduced, a selfsupervised approach to learn environment-level representations for embodied agents that outperforms the state-of-the-art on challenging tasks with only a limited budget of experience. Expand
Unsupervised Domain Adaptation for Visual Navigation
TLDR
This paper proposes an unsupervised domain adaptation method for visual navigation that translates the images in the target domain to the source domain such that the translation is consistent with the representations learned by the navigation policy. Expand
ICLR 2021 1 ? ? ? ? ?
  • 2020
We introduce environment predictive coding, a self-supervised approach to learn environment-level representations for embodied agents. In contrast to prior work on self-supervised learning forExpand
Auto-Navigator: Decoupled Neural Architecture Search for Visual Navigation
TLDR
This paper introduces imitation learning (IL) with optimal paths to optimize navigation policies while selecting an optimal architecture in neural architecture search, and proposes an Auto-Navigator to customize a specialized network for visual navigation. Expand
NeoNav: Improving the Generalization of Visual Navigation via Generating Next Expected Observations
We propose improving the cross-target and cross-scene generalization of visual navigation through learning an agent that is guided by conceiving the next observations it expects to see. This isExpand
Audio-Visual Waypoints for Navigation
TLDR
This work introduces a reinforcement learning approach to audio- visual navigation with two key novel elements 1) audio-visual waypoints that are dynamically set and learned end-to-end within the navigation policy, and 2) an acoustic memory that provides a structured, spatially grounded record of what the agent has heard as it moves. Expand
Self-Supervised Domain Adaptation for Visual Navigation with Global Map Consistency
  • Eun Sun Lee, Junho Kim, Young Min Kim
  • Computer Science
  • 2021
TLDR
This work proposes a light-weight, self-supervised adaptation for a visual navigation agent to generalize to unseen environment, and demonstrates test-time adaptation with the proposed task to show its potential applicability in real-world deployment. Expand
Audio-Visual Embodied Navigation
TLDR
This work develops a multi-modal deep reinforcement learning pipeline to train navigation policies end-to-end from a stream of egocentric audio-visual observations, allowing the agent to discover elements of the geometry of the physical space indicated by the reverberating audio and detect and follow sound-emitting targets. Expand
Deep Learning for Embodied Vision Navigation: A Survey
TLDR
This paper presents a comprehensive review of embodied navigation tasks and the recent progress in deep learning based methods, which includes two major tasks: target-oriented navigation and the instruction-oriented Navigation. Expand
MultiON: Benchmarking Semantic Map Memory using Multi-Object Navigation
TLDR
This work proposes the multiON task, which requires navigation to an episode-specific sequence of objects in a realistic environment and generalizes the ObjectGoal navigation task and explicitly tests the ability of navigation agents to locate previously observed goal objects. Expand
...
1
2
3
4
...

References

SHOWING 1-10 OF 58 REFERENCES
Target-driven visual navigation in indoor scenes using deep reinforcement learning
TLDR
This paper proposes an actor-critic model whose policy is a function of the goal as well as the current state, which allows better generalization and proposes the AI2-THOR framework, which provides an environment with high-quality 3D scenes and a physics engine. Expand
Visual Semantic Planning Using Deep Successor Representations
TLDR
This work addresses the problem of visual semantic planning: the task of predicting a sequence of actions from visual observations that transform a dynamic environment from an initial state to a goal state, and develops a deep predictive model based on successor representations. Expand
AI2-THOR: An Interactive 3D Environment for Visual AI
TLDR
AI2-THOR consists of near photo-realistic 3D indoor scenes, where AI agents can navigate in the scenes and interact with objects to perform tasks and facilitate building visually intelligent models. Expand
Mid-Level Visual Representations Improve Generalization and Sample Efficiency for Learning Active Tasks
TLDR
It is shown that learning active tasks with mid-level features is significantly more sample-efficient than scratch and able to generalize in situations where the from-scratch approach fails, and that proper use of mid- level perception confers significant advantages over training from scratch. Expand
Sim-to-Real Robot Learning from Pixels with Progressive Nets
TLDR
This work proposes using progressive networks to bridge the reality gap and transfer learned policies from simulation to the real world, and presents an early demonstration of this approach with a number of experiments in the domain of robot manipulation that focus on bridging thereality gap. Expand
Cognitive Mapping and Planning for Visual Navigation
TLDR
The Cognitive Mapper and Planner is based on a unified joint architecture for mapping and planning, such that the mapping is driven by the needs of the task, and a spatial memory with the ability to plan given an incomplete set of observations about the world. Expand
Taskonomy: Disentangling Task Transfer Learning
TLDR
This work proposes a fully computational approach for modeling the structure of space of visual tasks via finding (first and higher-order) transfer learning dependencies across a dictionary of twenty six 2D, 2.5D, 3D, and semantic tasks in a latent space and provides a computational taxonomic map for task transfer learning. Expand
Driving Policy Transfer via Modularity and Abstraction
TLDR
This work presents an approach to transferring driving policies from simulation to reality via modularity and abstraction, inspired by classic driving systems and aims to combine the benefits of modular architectures and end-to-end deep learning approaches. Expand
Learning to Navigate in Complex Environments
TLDR
This work considers jointly learning the goal-driven reinforcement learning problem with auxiliary depth prediction and loop closure classification tasks and shows that data efficiency and task performance can be dramatically improved by relying on additional auxiliary tasks leveraging multimodal sensory inputs. Expand
Sim2Real Viewpoint Invariant Visual Servoing by Recurrent Control
TLDR
This paper trains a deep recurrent controller that can automatically determine which actions move the end-effector of a robotic arm to a desired object and describes how the resulting model can be transferred to a real-world robot by disentangling perception from control and only adapting the visual layers. Expand
...
1
2
3
4
5
...