RoboTHOR: An Open Simulation-to-Real Embodied AI Platform
@article{Deitke2020RoboTHORAO, title={RoboTHOR: An Open Simulation-to-Real Embodied AI Platform}, author={Matt Deitke and Winson Han and Alvaro Herrasti and Aniruddha Kembhavi and Eric Kolve and Roozbeh Mottaghi and Jordi Salvador and Dustin Schwenk and Eli VanderBilt and Matthew Wallingford and Luca Weihs and Mark Yatskar and Ali Farhadi}, journal={2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, year={2020}, pages={3161-3171} }
Visual recognition ecosystems (e.g. ImageNet, Pascal, COCO) have undeniably played a prevailing role in the evolution of modern computer vision. We argue that interactive and embodied visual AI has reached a stage of development similar to visual recognition prior to the advent of these ecosystems. Recently, various synthetic environments have been introduced to facilitate research in embodied AI. Notwithstanding this progress, the crucial question of how well models trained in simulation…
Figures and Tables from this paper
65 Citations
Out of the Box: Embodied Navigation in the Real World
- Computer ScienceCAIP
- 2021
This work describes the architectural discrepancies that damage the Sim2Real adaptation ability of models trained on the Habitat simulator and proposes a novel solution tailored towards the deployment in realworld scenarios.
AllenAct: A Framework for Embodied AI Research
- Computer ScienceArXiv
- 2020
AllenAct is introduced, a modular and flexible learning framework designed with a focus on the unique requirements of Embodied AI research that provides first-class support for a growing collection of embodied environments, tasks and algorithms.
Towards Disturbance-Free Visual Mobile Manipulation
- Computer ScienceArXiv
- 2021
This work develops a new disturbance-avoidance methodology at the heart of which is the auxiliary task of disturbance prediction, which greatly enhances sample efficiency and final performance by knowledge distillation of disturbance into the agent.
On Embodied Visual Navigation in Real Environments Through Habitat
- Computer Science2020 25th International Conference on Pattern Recognition (ICPR)
- 2021
A tool based on the Habitat simulator is proposed which exploits real world images of the environment, together with sensor and actuator noise models, to produce more realistic navigation episodes and can effectively help to train and evaluate navigation policies on real-world observations without running navigation episodes in the real world.
BenchBot environments for active robotics (BEAR): Simulated data for active scene understanding research
- Computer ScienceThe International Journal of Robotics Research
- 2022
This work presents a platform to foster research in active scene understanding, consisting of high-fidelity simulated environments and a simple yet powerful API that controls a mobile robot in simulation and reality, and provides three levels of robot agency.
Core Challenges in Embodied Vision-Language Planning
- Computer ScienceArXiv
- 2021
A taxonomy is proposed to unify Embodied Vision-Language Planning tasks, a family of prominent embodied navigation and manipulation problems that jointly use computer vision and natural language, and an in-depth analysis and comparison of the new and current algorithmic approaches, metrics, simulated environments, as well as the datasets used for EVLP tasks are presented.
Deep Learning for Embodied Vision Navigation: A Survey
- Computer ScienceArXiv
- 2021
This paper presents a comprehensive review of embodied navigation tasks and the recent progress in deep learning based methods, which includes two major tasks: target-oriented navigation and the instruction-oriented Navigation.
Visual Room Rearrangement
- Computer Science2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
- 2021
The experiments show that solving this challenging interactive task that involves navigation and object interaction is beyond the capabilities of the current state-of-the-art techniques for embodied tasks and the authors are still very far from achieving perfect performance on these types of tasks.
Towards Explainable Embodied AI
- Computer Science
- 2021
The proposed explainability methods for embodied AI facilitate the analysis of policy failure cases in different out-of-distribution scenarios and conclude that embodied AI policies can be understood with feature attributions to explain how input state features influence the predicted actions.
ForeSI: Success-Aware Visual Navigation Agent
- Computer Science2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)
- 2022
This work augments the model-free RL with a forward model that can predict a representation of a future state, from the beginning of a navigation episode, if the episode were to be successful, and develops an algorithm to integrate a replay buffer into the models that alternates between training the policy and the forward model.
References
SHOWING 1-10 OF 81 REFERENCES
Gibson Env: Real-World Perception for Embodied Agents
- Computer Science2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition
- 2018
This paper investigates developing real-world perception for active agents, proposes Gibson Environment for this purpose, and showcases a set of perceptual tasks learned therein.
Habitat: A Platform for Embodied AI Research
- Computer Science2019 IEEE/CVF International Conference on Computer Vision (ICCV)
- 2019
The comparison between learning and SLAM approaches from two recent works are revisited and evidence is found -- that learning outperforms SLAM if scaled to an order of magnitude more experience than previous investigations, and the first cross-dataset generalization experiments are conducted.
Domain randomization for transferring deep neural networks from simulation to the real world
- Computer Science2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
- 2017
This paper explores domain randomization, a simple technique for training models on simulated images that transfer to real images by randomizing rendering in the simulator, and achieves the first successful transfer of a deep neural network trained only on simulated RGB images to the real world for the purpose of robotic control.
Vision-and-Language Navigation: Interpreting Visually-Grounded Navigation Instructions in Real Environments
- Computer Science2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition
- 2018
This work provides the first benchmark dataset for visually-grounded natural language navigation in real buildings - the Room-to-Room (R2R) dataset and presents the Matter-port3D Simulator - a large-scale reinforcement learning environment based on real imagery.
Sim-to-Real Transfer of Robotic Control with Dynamics Randomization
- Computer Science2018 IEEE International Conference on Robotics and Automation (ICRA)
- 2018
By randomizing the dynamics of the simulator during training, this paper is able to develop policies that are capable of adapting to very different dynamics, including ones that differ significantly from the dynamics on which the policies were trained.
Self-Supervised Deep Reinforcement Learning with Generalized Computation Graphs for Robot Navigation
- Computer Science2018 IEEE International Conference on Robotics and Automation (ICRA)
- 2018
A generalized computation graph is proposed that subsumes value-based model-free methods and model-based methods, and is instantiate to form a navigation model that learns from raw images and is sample efficient, and outperforms single-step and double-step double Q-learning.
Transferring End-to-End Visuomotor Control from Simulation to Real World for a Multi-Stage Task
- Computer ScienceCoRL
- 2017
This paper shows how two simple techniques can lead to end-to-end (image to velocity) execution of a multi-stage task, which is analogous to a simple tidying routine, without having seen a single real image.
AirSim: High-Fidelity Visual and Physical Simulation for Autonomous Vehicles
- Computer ScienceFSR
- 2017
A new simulator built on Unreal Engine that offers physically and visually realistic simulations for autonomous vehicles in real world and that is designed from the ground up to be extensible to accommodate new types of vehicles, hardware platforms and software protocols.
(CAD)$^2$RL: Real Single-Image Flight without a Single Real Image
- Computer ScienceRobotics: Science and Systems
- 2017
This paper proposes a learning method that they call CAD$^2$RL, which can be used to perform collision-free indoor flight in the real world while being trained entirely on 3D CAD models, and shows that it can train a policy that generalizes to thereal world, without requiring the simulator to be particularly realistic or high-fidelity.
Using Simulation and Domain Adaptation to Improve Efficiency of Deep Robotic Grasping
- Computer Science2018 IEEE International Conference on Robotics and Automation (ICRA)
- 2018
This work study how randomized simulated environments and domain adaptation methods can be extended to train a grasping system to grasp novel objects from raw monocular RGB images, including a novel extension of pixel-level domain adaptation that is term the GraspGAN.