Pushing it out of the Way: Interactive Visual Navigation

  title={Pushing it out of the Way: Interactive Visual Navigation},
  author={Kuo-Hao Zeng and Luca Weihs and Ali Farhadi and Roozbeh Mottaghi},
  journal={2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
We have observed significant progress in visual navigation for embodied agents. A common assumption in studying visual navigation is that the environments are static; this is a limiting assumption. Intelligent navigation may involve interacting with the environment beyond just moving forward/backward and turning left/right. Sometimes, the best way to navigate is to push something out of the way. In this paper, we study the problem of interactive navigation where agents learn to change the… 

Transformer Memory for Interactive Visual Navigation in Cluttered Environments

A transformer-based memory is proposed to empower the agents utilizing the historical interactive information and a surrogate objective to predict the next waypoint as the auxiliary task, which facilitates the representation learning and bootstraps the RL.

Cascaded Compositional Residual Learning for Complex Interactive Behaviors

This work presents a novel frame- work, Cascaded Compositional Residual Learning (CCRL), which learns composite skills by recursively leveraging a library of previously learned control policies, and shows that this framework learns joint-level control policies for a diverse set of motor skills ranging from basic locomotion to complex interactive navigation.

Object Manipulation via Visual Target Localization

This work proposes Manipulation via Visual Object Location Estimation (m-VOLE), an approach that explores the environment in search for target objects, computes their 3D coordinates once they are located, and then continues to estimate their3D locations even when the objects are not visible, thus robustly aiding the task of manipulating these objects throughout the episode.

ASC me to Do Anything: Multi-task Training for Embodied AI

Atomic Skill Completion (ASC) is proposed, an approach for multi-task training for Embodied AI, where a set of atomic skills shared across multiple tasks are composed together to perform the tasks.

A Survey of Embodied AI: From Simulators to Research Tasks

An encyclopedic survey of the three main research tasks in embodied AI – visual exploration, visual navigation and embodied question answering – covering the state-of-the-art approaches, evaluation metrics and datasets is surveyed.

GridToPix: Training Embodied Agents with Minimal Supervision

GRIDTOPIX is proposed to train agents with terminal rewards in gridworlds that generically mirror Embodied AI environments, i.e., they are independent of the task; 2) distill the learned policy into agents that reside in complex visual worlds.



Proximal policy optimization algorithms. arXiv, 2017

  • 2017

AI2-THOR: An Interactive 3D Environment for Visual AI

AI2-THOR consists of near photo-realistic 3D indoor scenes, where AI agents can navigate in the scenes and interact with objects to perform tasks and facilitate building visually intelligent models.

Attention is All you Need

A new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely is proposed, which generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.

Rearrangement: A Challenge for Embodied AI

A framework for research and evaluation in Embodied AI is described, based on a canonical task: Rearrangement, that can focus the development of new techniques and serve as a source of trained models that can be transferred to other settings.

Beyond Tabula-Rasa: a Modular Reinforcement Learning Approach for Physically Embedded 3D Sokoban

This work explores whether integrated tasks like Mujoban can be solved by composing RL modules together in a sense-plan-act hierarchy, where modules have well-defined roles similarly to classic robot architectures, and finds that the modular RL approach dramatically outperforms the state-of-the-art monolithic RL agent on Mu Joban.

Physically Embedded Planning Problems: New Challenges for Reinforcement Learning

A strong baseline is introduced that uses a pre-trained expert game player to provide hints in the abstract space to an RL agent's policy while training it on the full sensorimotor control task, underlining the need for methods that bridge the gap between abstract planning and embodied control.

AllenAct: A Framework for Embodied AI Research

AllenAct is introduced, a modular and flexible learning framework designed with a focus on the unique requirements of Embodied AI research that provides first-class support for a growing collection of embodied environments, tasks and algorithms.

ReLMoGen: Integrating Motion Generation in Reinforcement Learning for Mobile Manipulation

It is argued that, by lifting the action space and by leveraging sampling-based motion planners, this work can efficiently use RL to solve complex, long-horizon tasks that could not be solved with existing RL methods in the original action space.

Hindsight for Foresight: Unsupervised Structured Dynamics Models from Physical Interaction

This work proposes a novel approach for modeling the dynamics of a robot’s interactions directly from unlabeled 3D point clouds and images, which leads to effective, interpretable models that can be used for visuomotor control and planning.

Learning Object Relation Graph and Tentative Policy for Visual Navigation

Three complementary techniques, object relation graph (ORG), trial-driven imitation learning (IL), and a memory-augmented tentative policy network (TPN), which improves visual representation learning by integrating object relationships, including category closeness and spatial correlations are proposed.