RoboTHOR: An Open Simulation-to-Real Embodied AI Platform

  title={RoboTHOR: An Open Simulation-to-Real Embodied AI Platform},
  author={Matt Deitke and Winson Han and Alvaro Herrasti and Aniruddha Kembhavi and Eric Kolve and Roozbeh Mottaghi and Jordi Salvador and Dustin Schwenk and Eli VanderBilt and Matthew Wallingford and Luca Weihs and Mark Yatskar and Ali Farhadi},
  journal={2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  • Matt Deitke, Winson Han, Ali Farhadi
  • Published 14 April 2020
  • Computer Science
  • 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Visual recognition ecosystems (e.g. ImageNet, Pascal, COCO) have undeniably played a prevailing role in the evolution of modern computer vision. We argue that interactive and embodied visual AI has reached a stage of development similar to visual recognition prior to the advent of these ecosystems. Recently, various synthetic environments have been introduced to facilitate research in embodied AI. Notwithstanding this progress, the crucial question of how well models trained in simulation… 

Figures and Tables from this paper

Out of the Box: Embodied Navigation in the Real World
This work describes the architectural discrepancies that damage the Sim2Real adaptation ability of models trained on the Habitat simulator and proposes a novel solution tailored towards the deployment in realworld scenarios.
AllenAct: A Framework for Embodied AI Research
AllenAct is introduced, a modular and flexible learning framework designed with a focus on the unique requirements of Embodied AI research that provides first-class support for a growing collection of embodied environments, tasks and algorithms.
ProcTHOR: Large-Scale Embodied AI Using Procedural Generation
The proposed PROCTHOR, a framework for procedural generation of Embodied AI environments, enables us to sample arbitrarily large datasets of diverse, interactive, customizable, and performant virtual environments to train and evaluate embodied agents across navigation, interaction, and manipulation tasks.
A Survey of Embodied AI: From Simulators to Research Tasks
An encyclopedic survey of the three main research tasks in embodied AI – visual exploration, visual navigation and embodied question answering – covering the state-of-the-art approaches, evaluation metrics and datasets is surveyed.
Towards Disturbance-Free Visual Mobile Manipulation
This work develops a new disturbance-avoidance methodology at the heart of which is the auxiliary task of disturbance prediction, which greatly enhances sample efficiency and final performance by knowledge distillation of disturbance into the agent.
On Embodied Visual Navigation in Real Environments Through Habitat
A tool based on the Habitat simulator is proposed which exploits real world images of the environment, together with sensor and actuator noise models, to produce more realistic navigation episodes and can effectively help to train and evaluate navigation policies on real-world observations without running navigation episodes in the real world.
BenchBot environments for active robotics (BEAR): Simulated data for active scene understanding research
This work presents a platform to foster research in active scene understanding, consisting of high-fidelity simulated environments and a simple yet powerful API that controls a mobile robot in simulation and reality, and provides three levels of robot agency.
Core Challenges in Embodied Vision-Language Planning
This paper proposes a taxonomy to unify Embodied Vision-Language Planning tasks, a family of prominent embodied navigation and manipulation problems that jointly use computer vision and natural language, and presents the core challenges that new EVLP works should seek to address and advocates for task construction that enables model generalizability and furthers real-world deployment.
Rethinking Sim2Real: Lower Fidelity Simulation Leads to Higher Sim2Real Transfer in Navigation
The results show that, contrary to expectation, adding fidelity does not help with learning; performance is poor due to slow simulation speed (preventing large-scale learning) and overfitting to inaccuracies in simulation physics.
The ThreeDWorld Transport Challenge: A Visually Guided Task-and-Motion Planning Benchmark for Physically Realistic Embodied AI
This work builds this benchmark challenge using the ThreeDWorld simulation: a virtual 3D environment where all objects respond to physics, and a robot agent can be controlled using a fully physics-driven navigation and interaction API.


Gibson Env: Real-World Perception for Embodied Agents
This paper investigates developing real-world perception for active agents, proposes Gibson Environment for this purpose, and showcases a set of perceptual tasks learned therein.
Habitat: A Platform for Embodied AI Research
The comparison between learning and SLAM approaches from two recent works are revisited and evidence is found -- that learning outperforms SLAM if scaled to an order of magnitude more experience than previous investigations, and the first cross-dataset generalization experiments are conducted.
Domain randomization for transferring deep neural networks from simulation to the real world
This paper explores domain randomization, a simple technique for training models on simulated images that transfer to real images by randomizing rendering in the simulator, and achieves the first successful transfer of a deep neural network trained only on simulated RGB images to the real world for the purpose of robotic control.
Vision-and-Language Navigation: Interpreting Visually-Grounded Navigation Instructions in Real Environments
This work provides the first benchmark dataset for visually-grounded natural language navigation in real buildings - the Room-to-Room (R2R) dataset and presents the Matter-port3D Simulator - a large-scale reinforcement learning environment based on real imagery.
Sim-to-Real Transfer of Robotic Control with Dynamics Randomization
By randomizing the dynamics of the simulator during training, this paper is able to develop policies that are capable of adapting to very different dynamics, including ones that differ significantly from the dynamics on which the policies were trained.
Self-Supervised Deep Reinforcement Learning with Generalized Computation Graphs for Robot Navigation
A generalized computation graph is proposed that subsumes value-based model-free methods and model-based methods, and is instantiate to form a navigation model that learns from raw images and is sample efficient, and outperforms single-step and double-step double Q-learning.
Transferring End-to-End Visuomotor Control from Simulation to Real World for a Multi-Stage Task
This paper shows how two simple techniques can lead to end-to-end (image to velocity) execution of a multi-stage task, which is analogous to a simple tidying routine, without having seen a single real image.
AirSim: High-Fidelity Visual and Physical Simulation for Autonomous Vehicles
A new simulator built on Unreal Engine that offers physically and visually realistic simulations for autonomous vehicles in real world and that is designed from the ground up to be extensible to accommodate new types of vehicles, hardware platforms and software protocols.
(CAD)$^2$RL: Real Single-Image Flight without a Single Real Image
This paper proposes a learning method that they call CAD$^2$RL, which can be used to perform collision-free indoor flight in the real world while being trained entirely on 3D CAD models, and shows that it can train a policy that generalizes to thereal world, without requiring the simulator to be particularly realistic or high-fidelity.
Using Simulation and Domain Adaptation to Improve Efficiency of Deep Robotic Grasping
This work study how randomized simulated environments and domain adaptation methods can be extended to train a grasping system to grasp novel objects from raw monocular RGB images, including a novel extension of pixel-level domain adaptation that is term the GraspGAN.