Corpus ID: 28328610

AI2-THOR: An Interactive 3D Environment for Visual AI

@article{Kolve2017AI2THORAI,
  title={AI2-THOR: An Interactive 3D Environment for Visual AI},
  author={Eric Kolve and Roozbeh Mottaghi and Winson Han and Eli VanderBilt and Luca Weihs and Alvaro Herrasti and Daniel Gordon and Yuke Zhu and Abhinav Gupta and Ali Farhadi},
  journal={ArXiv},
  year={2017},
  volume={abs/1712.05474}
}
We introduce The House Of inteRactions (THOR), a framework for visual AI research, available at this http URL AI2-THOR consists of near photo-realistic 3D indoor scenes, where AI agents can navigate in the scenes and interact with objects to perform tasks. AI2-THOR enables research in many different domains including but not limited to deep reinforcement learning, imitation learning, learning by interaction, planning, visual question answering, unsupervised representation learning, object… Expand
Visual Semantic Navigation using Scene Priors
TLDR
This work proposes to use Graph Convolutional Networks for incorporating the prior knowledge into a deep reinforcement learning framework and shows how semantic knowledge improves performance significantly and improves in generalization to unseen scenes and/or objects. Expand
MaAST: Map Attention with Semantic Transformersfor Efficient Visual Navigation
TLDR
This work proposes a method to encode vital scene semantics such as traversable paths, unexplored areas, and observed scene objects–alongside raw visual streams such as RGB, depth, and semantic segmentation masks—into a semantically informed, top-down egocentric map representation and introduces a novel 2-D map attention mechanism. Expand
Embodied Visual Active Learning for Semantic Segmentation
TLDR
This work extensively evaluates the proposed models using the photorealistic Matterport3D simulator and shows that a fully learnt method outperforms comparable pre-specified counterparts, even when requesting fewer annotations. Expand
Environment Predictive Coding for Embodied Agents
TLDR
The environment predictive coding method is introduced, a selfsupervised approach to learn environment-level representations for embodied agents that outperforms the state-of-the-art on challenging tasks with only a limited budget of experience. Expand
ICLR 2021 1 ? ? ? ? ?
  • 2020
We introduce environment predictive coding, a self-supervised approach to learn environment-level representations for embodied agents. In contrast to prior work on self-supervised learning forExpand
Improving Target-driven Visual Navigation with Attention on 3D Spatial Relationships
TLDR
This paper investigates the target-driven visual navigation using deep reinforcement learning (DRL) in 3D indoor scenes, whose navigation task aims to train an agent that can intelligently make a series of decisions to arrive at a pre-specified target location from any possible starting positions only based on egocentric views. Expand
SplitNet: Sim2Sim and Task2Task Transfer for Embodied Visual Navigation
We propose SplitNet, a method for decoupling visual perception and policy learning. By incorporating auxiliary tasks and selective learning of portions of the model, we explicitly decompose theExpand
AllenAct: A Framework for Embodied AI Research
TLDR
AllenAct is introduced, a modular and flexible learning framework designed with a focus on the unique requirements of Embodied AI research that provides first-class support for a growing collection of embodied environments, tasks and algorithms. Expand
Utilising Prior Knowledge for Visual Navigation: Distil and Adapt
TLDR
This paper proposes to decompose the value function in the actor-critic reinforcement learning algorithm and incorporate the prior in the critic in a novel way that reduces the model complexity and improves model generalisation. Expand
Visual Navigation using Deep Reinforcement Learning
Deep reinforcement learning (RL) has been successfully applied to a variety of game-like environments. However, the application of deep RL to visual navigation with realistic 3D environments is aExpand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 29 REFERENCES
Target-driven visual navigation in indoor scenes using deep reinforcement learning
TLDR
This paper proposes an actor-critic model whose policy is a function of the goal as well as the current state, which allows better generalization and proposes the AI2-THOR framework, which provides an environment with high-quality 3D scenes and a physics engine. Expand
Visual Semantic Navigation using Scene Priors
TLDR
This work proposes to use Graph Convolutional Networks for incorporating the prior knowledge into a deep reinforcement learning framework and shows how semantic knowledge improves performance significantly and improves in generalization to unseen scenes and/or objects. Expand
Visual Semantic Planning Using Deep Successor Representations
TLDR
This work addresses the problem of visual semantic planning: the task of predicting a sequence of actions from visual observations that transform a dynamic environment from an initial state to a goal state, and develops a deep predictive model based on successor representations. Expand
IQA: Visual Question Answering in Interactive Environments
TLDR
The Hierarchical Interactive Memory Network (HIMN), consisting of a factorized set of controllers, allowing the system to operate at multiple levels of temporal abstraction, is proposed, and outperforms popular single controller based methods on IQUAD V1. Expand
ViZDoom: A Doom-based AI research platform for visual reinforcement learning
TLDR
A novel test-bed platform for reinforcement learning research from raw visual information which employs the first-person perspective in a semi-realistic 3D world and confirms the utility of ViZDoom as an AI research platform and implies that visual reinforcement learning in 3D realistic first- person perspective environments is feasible. Expand
Building Generalizable Agents with a Realistic and Rich 3D Environment
TLDR
House3D is built, a rich, extensible and efficient environment that contains 45,622 human-designed 3D scenes of houses, equipped with a diverse set of fully labeled 3D objects, textures and scene layouts, based on the SUNCG dataset and an emphasis on semantic-level generalization. Expand
HoME: a Household Multimodal Environment
TLDR
HoME is an open-source, OpenAI Gym-compatible platform extensible to tasks in reinforcement learning, language grounding, sound-based navigation, robotics, multi-agent learning, and more that better enables artificial agents to learn as humans do: in an interactive, multimodal, and richly contextualized setting. Expand
MINOS: Multimodal Indoor Simulator for Navigation in Complex Environments
TLDR
MINOS is used to benchmark deep-learning-based navigation methods, to analyze the influence of environmental complexity on navigation performance, and to carry out a controlled study of multimodality in sensorimotor learning. Expand
SceneNet: An annotated model generator for indoor scene understanding
We introduce SceneNet, a framework for generating high-quality annotated 3D scenes to aid indoor scene understanding. SceneNet leverages manually-annotated datasets of real world scenes such as NYUv2Expand
SeGAN: Segmenting and Generating the Invisible
TLDR
This paper studies the challenging problem of completing the appearance of occluded objects and proposes a novel solution, SeGAN, which outperforms state-of-the-art segmentation baselines for the invisible parts of objects. Expand
...
1
2
3
...