Pushing it out of the Way: Interactive Visual Navigation
@article{Zeng2021PushingIO, title={Pushing it out of the Way: Interactive Visual Navigation}, author={Kuo-Hao Zeng and Luca Weihs and Ali Farhadi and Roozbeh Mottaghi}, journal={2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, year={2021}, pages={9863-9872} }
We have observed significant progress in visual navigation for embodied agents. A common assumption in studying visual navigation is that the environments are static; this is a limiting assumption. Intelligent navigation may involve interacting with the environment beyond just moving forward/backward and turning left/right. Sometimes, the best way to navigate is to push something out of the way. In this paper, we study the problem of interactive navigation where agents learn to change the…
Figures and Tables from this paper
6 Citations
Transformer Memory for Interactive Visual Navigation in Cluttered Environments
- Computer ScienceIEEE Robotics and Automation Letters
- 2023
A transformer-based memory is proposed to empower the agents utilizing the historical interactive information and a surrogate objective to predict the next waypoint as the auxiliary task, which facilitates the representation learning and bootstraps the RL.
Cascaded Compositional Residual Learning for Complex Interactive Behaviors
- Computer ScienceArXiv
- 2022
This work presents a novel frame- work, Cascaded Compositional Residual Learning (CCRL), which learns composite skills by recursively leveraging a library of previously learned control policies, and shows that this framework learns joint-level control policies for a diverse set of motor skills ranging from basic locomotion to complex interactive navigation.
Object Manipulation via Visual Target Localization
- Computer ScienceECCV
- 2022
This work proposes Manipulation via Visual Object Location Estimation (m-VOLE), an approach that explores the environment in search for target objects, computes their 3D coordinates once they are located, and then continues to estimate their3D locations even when the objects are not visible, thus robustly aiding the task of manipulating these objects throughout the episode.
ASC me to Do Anything: Multi-task Training for Embodied AI
- Computer ScienceArXiv
- 2022
Atomic Skill Completion (ASC) is proposed, an approach for multi-task training for Embodied AI, where a set of atomic skills shared across multiple tasks are composed together to perform the tasks.
A Survey of Embodied AI: From Simulators to Research Tasks
- Computer ScienceIEEE Transactions on Emerging Topics in Computational Intelligence
- 2022
An encyclopedic survey of the three main research tasks in embodied AI – visual exploration, visual navigation and embodied question answering – covering the state-of-the-art approaches, evaluation metrics and datasets is surveyed.
GridToPix: Training Embodied Agents with Minimal Supervision
- Computer Science2021 IEEE/CVF International Conference on Computer Vision (ICCV)
- 2021
GRIDTOPIX is proposed to train agents with terminal rewards in gridworlds that generically mirror Embodied AI environments, i.e., they are independent of the task; 2) distill the learned policy into agents that reside in complex visual worlds.
References
SHOWING 1-10 OF 46 REFERENCES
Proximal policy optimization algorithms. arXiv, 2017
- 2017
AI2-THOR: An Interactive 3D Environment for Visual AI
- Computer ScienceArXiv
- 2017
AI2-THOR consists of near photo-realistic 3D indoor scenes, where AI agents can navigate in the scenes and interact with objects to perform tasks and facilitate building visually intelligent models.
Attention is All you Need
- Computer ScienceNIPS
- 2017
A new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely is proposed, which generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.
Rearrangement: A Challenge for Embodied AI
- Computer ScienceArXiv
- 2020
A framework for research and evaluation in Embodied AI is described, based on a canonical task: Rearrangement, that can focus the development of new techniques and serve as a source of trained models that can be transferred to other settings.
Beyond Tabula-Rasa: a Modular Reinforcement Learning Approach for Physically Embedded 3D Sokoban
- Computer ScienceArXiv
- 2020
This work explores whether integrated tasks like Mujoban can be solved by composing RL modules together in a sense-plan-act hierarchy, where modules have well-defined roles similarly to classic robot architectures, and finds that the modular RL approach dramatically outperforms the state-of-the-art monolithic RL agent on Mu Joban.
Physically Embedded Planning Problems: New Challenges for Reinforcement Learning
- Computer ScienceArXiv
- 2020
A strong baseline is introduced that uses a pre-trained expert game player to provide hints in the abstract space to an RL agent's policy while training it on the full sensorimotor control task, underlining the need for methods that bridge the gap between abstract planning and embodied control.
AllenAct: A Framework for Embodied AI Research
- Computer ScienceArXiv
- 2020
AllenAct is introduced, a modular and flexible learning framework designed with a focus on the unique requirements of Embodied AI research that provides first-class support for a growing collection of embodied environments, tasks and algorithms.
ReLMoGen: Integrating Motion Generation in Reinforcement Learning for Mobile Manipulation
- Computer Science2021 IEEE International Conference on Robotics and Automation (ICRA)
- 2021
It is argued that, by lifting the action space and by leveraging sampling-based motion planners, this work can efficiently use RL to solve complex, long-horizon tasks that could not be solved with existing RL methods in the original action space.
Hindsight for Foresight: Unsupervised Structured Dynamics Models from Physical Interaction
- Computer Science2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
- 2020
This work proposes a novel approach for modeling the dynamics of a robot’s interactions directly from unlabeled 3D point clouds and images, which leads to effective, interpretable models that can be used for visuomotor control and planning.
Learning Object Relation Graph and Tentative Policy for Visual Navigation
- Computer ScienceECCV
- 2020
Three complementary techniques, object relation graph (ORG), trial-driven imitation learning (IL), and a memory-augmented tentative policy network (TPN), which improves visual representation learning by integrating object relationships, including category closeness and spatial correlations are proposed.