• Publications
  • Influence
AI2-THOR: An Interactive 3D Environment for Visual AI
TLDR
AI2-THOR consists of near photo-realistic 3D indoor scenes, where AI agents can navigate in the scenes and interact with objects to perform tasks and facilitate building visually intelligent models. Expand
Target-driven visual navigation in indoor scenes using deep reinforcement learning
TLDR
This paper proposes an actor-critic model whose policy is a function of the goal as well as the current state, which allows better generalization and proposes the AI2-THOR framework, which provides an environment with high-quality 3D scenes and a physics engine. Expand
A Diagram is Worth a Dozen Images
TLDR
An LSTM-based method for syntactic parsing of diagrams and a DPG-based attention model for diagram question answering are devised and a new dataset of diagrams with exhaustive annotations of constituents and relationships is compiled. Expand
RoboTHOR: An Open Simulation-to-Real Embodied AI Platform
TLDR
RoboTHOR offers a framework of simulated environments paired with physical counterparts to systematically explore and overcome the challenges of simulation-to-real transfer, and a platform where researchers across the globe can remotely test their embodied models in the physical world. Expand
Visual Semantic Planning Using Deep Successor Representations
TLDR
This work addresses the problem of visual semantic planning: the task of predicting a sequence of actions from visual observations that transform a dynamic environment from an initial state to a goal state, and develops a deep predictive model based on successor representations. Expand
A Cordial Sync: Going Beyond Marginal Policies for Multi-Agent Embodied Tasks
TLDR
The novel task FurnMove is introduced, in which agents work together to move a piece of furniture through a living room to a goal, and SYNC-policies (synchronize your actions coherently) and CORDIAL (coordination loss) are introduced. Expand
Two Body Problem: Collaborative Visual Task Completion
TLDR
This paper studies the problem of learning to collaborate directly from pixels in AI2-THOR and demonstrates the benefits of explicit and implicit modes of communication to perform visual tasks. Expand
Learning Generalizable Visual Representations via Interactive Gameplay
TLDR
This work is the first to show that embodied adversarial reinforcement learning agents playing cache, a variant of hide-and-seek, in a high fidelity, interactive, environment, learn representations of their observations encoding information such as occlusion, object permanence, free space, and containment. Expand
ManipulaTHOR: A Framework for Visual Object Manipulation
TLDR
This work proposes a framework for object manipulation built upon the physics-enabled, visually rich AI2-THOR framework and presents a new challenge to the Embodied AI community known as ArmPointNav, which extends the popular point navigation task to object manipulation and offers new challenges including 3D obstacle avoidance. Expand
...
1
2
...