RoboTHOR: An Open Simulation-to-Real Embodied AI Platform

@article{Deitke2020RoboTHORAO,
  title={RoboTHOR: An Open Simulation-to-Real Embodied AI Platform},
  author={Matt Deitke and Winson Han and Alvaro Herrasti and Aniruddha Kembhavi and Eric Kolve and Roozbeh Mottaghi and Jordi Salvador and Dustin Schwenk and Eli VanderBilt and Matthew Wallingford and Luca Weihs and Mark Yatskar and Ali Farhadi},
  journal={2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2020},
  pages={3161-3171}
}
  • Matt Deitke, Winson Han, Ali Farhadi
  • Published 14 April 2020
  • Computer Science
  • 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Visual recognition ecosystems (e.g. ImageNet, Pascal, COCO) have undeniably played a prevailing role in the evolution of modern computer vision. We argue that interactive and embodied visual AI has reached a stage of development similar to visual recognition prior to the advent of these ecosystems. Recently, various synthetic environments have been introduced to facilitate research in embodied AI. Notwithstanding this progress, the crucial question of how well models trained in simulation… 

Figures and Tables from this paper

Out of the Box: Embodied Navigation in the Real World
TLDR
This work describes the architectural discrepancies that damage the Sim2Real adaptation ability of models trained on the Habitat simulator and proposes a novel solution tailored towards the deployment in realworld scenarios.
AllenAct: A Framework for Embodied AI Research
TLDR
AllenAct is introduced, a modular and flexible learning framework designed with a focus on the unique requirements of Embodied AI research that provides first-class support for a growing collection of embodied environments, tasks and algorithms.
Towards Disturbance-Free Visual Mobile Manipulation
TLDR
This work develops a new disturbance-avoidance methodology at the heart of which is the auxiliary task of disturbance prediction, which greatly enhances sample efficiency and final performance by knowledge distillation of disturbance into the agent.
On Embodied Visual Navigation in Real Environments Through Habitat
TLDR
A tool based on the Habitat simulator is proposed which exploits real world images of the environment, together with sensor and actuator noise models, to produce more realistic navigation episodes and can effectively help to train and evaluate navigation policies on real-world observations without running navigation episodes in the real world.
BenchBot environments for active robotics (BEAR): Simulated data for active scene understanding research
TLDR
This work presents a platform to foster research in active scene understanding, consisting of high-fidelity simulated environments and a simple yet powerful API that controls a mobile robot in simulation and reality, and provides three levels of robot agency.
Core Challenges in Embodied Vision-Language Planning
TLDR
A taxonomy is proposed to unify Embodied Vision-Language Planning tasks, a family of prominent embodied navigation and manipulation problems that jointly use computer vision and natural language, and an in-depth analysis and comparison of the new and current algorithmic approaches, metrics, simulated environments, as well as the datasets used for EVLP tasks are presented.
Deep Learning for Embodied Vision Navigation: A Survey
TLDR
This paper presents a comprehensive review of embodied navigation tasks and the recent progress in deep learning based methods, which includes two major tasks: target-oriented navigation and the instruction-oriented Navigation.
Visual Room Rearrangement
TLDR
The experiments show that solving this challenging interactive task that involves navigation and object interaction is beyond the capabilities of the current state-of-the-art techniques for embodied tasks and the authors are still very far from achieving perfect performance on these types of tasks.
Towards Explainable Embodied AI
  • Computer Science
  • 2021
TLDR
The proposed explainability methods for embodied AI facilitate the analysis of policy failure cases in different out-of-distribution scenarios and conclude that embodied AI policies can be understood with feature attributions to explain how input state features influence the predicted actions.
ForeSI: Success-Aware Visual Navigation Agent
TLDR
This work augments the model-free RL with a forward model that can predict a representation of a future state, from the beginning of a navigation episode, if the episode were to be successful, and develops an algorithm to integrate a replay buffer into the models that alternates between training the policy and the forward model.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 81 REFERENCES
Gibson Env: Real-World Perception for Embodied Agents
TLDR
This paper investigates developing real-world perception for active agents, proposes Gibson Environment for this purpose, and showcases a set of perceptual tasks learned therein.
Habitat: A Platform for Embodied AI Research
TLDR
The comparison between learning and SLAM approaches from two recent works are revisited and evidence is found -- that learning outperforms SLAM if scaled to an order of magnitude more experience than previous investigations, and the first cross-dataset generalization experiments are conducted.
Domain randomization for transferring deep neural networks from simulation to the real world
TLDR
This paper explores domain randomization, a simple technique for training models on simulated images that transfer to real images by randomizing rendering in the simulator, and achieves the first successful transfer of a deep neural network trained only on simulated RGB images to the real world for the purpose of robotic control.
Vision-and-Language Navigation: Interpreting Visually-Grounded Navigation Instructions in Real Environments
TLDR
This work provides the first benchmark dataset for visually-grounded natural language navigation in real buildings - the Room-to-Room (R2R) dataset and presents the Matter-port3D Simulator - a large-scale reinforcement learning environment based on real imagery.
Sim-to-Real Transfer of Robotic Control with Dynamics Randomization
TLDR
By randomizing the dynamics of the simulator during training, this paper is able to develop policies that are capable of adapting to very different dynamics, including ones that differ significantly from the dynamics on which the policies were trained.
Self-Supervised Deep Reinforcement Learning with Generalized Computation Graphs for Robot Navigation
TLDR
A generalized computation graph is proposed that subsumes value-based model-free methods and model-based methods, and is instantiate to form a navigation model that learns from raw images and is sample efficient, and outperforms single-step and double-step double Q-learning.
Transferring End-to-End Visuomotor Control from Simulation to Real World for a Multi-Stage Task
TLDR
This paper shows how two simple techniques can lead to end-to-end (image to velocity) execution of a multi-stage task, which is analogous to a simple tidying routine, without having seen a single real image.
AirSim: High-Fidelity Visual and Physical Simulation for Autonomous Vehicles
TLDR
A new simulator built on Unreal Engine that offers physically and visually realistic simulations for autonomous vehicles in real world and that is designed from the ground up to be extensible to accommodate new types of vehicles, hardware platforms and software protocols.
(CAD)$^2$RL: Real Single-Image Flight without a Single Real Image
TLDR
This paper proposes a learning method that they call CAD$^2$RL, which can be used to perform collision-free indoor flight in the real world while being trained entirely on 3D CAD models, and shows that it can train a policy that generalizes to thereal world, without requiring the simulator to be particularly realistic or high-fidelity.
Using Simulation and Domain Adaptation to Improve Efficiency of Deep Robotic Grasping
TLDR
This work study how randomized simulated environments and domain adaptation methods can be extended to train a grasping system to grasp novel objects from raw monocular RGB images, including a novel extension of pixel-level domain adaptation that is term the GraspGAN.
...
1
2
3
4
5
...