• Corpus ID: 221370597

AllenAct: A Framework for Embodied AI Research

@article{Weihs2020AllenActAF,
  title={AllenAct: A Framework for Embodied AI Research},
  author={Luca Weihs and Jordi Salvador and Klemen Kotar and Unnat Jain and Kuo-Hao Zeng and Roozbeh Mottaghi and Aniruddha Kembhavi},
  journal={ArXiv},
  year={2020},
  volume={abs/2008.12760}
}
The domain of Embodied AI, in which agents learn to complete tasks through interaction with their environment from egocentric observations, has experienced substantial growth with the advent of deep reinforcement learning and increased interest from the computer vision, NLP, and robotics communities. This growth has been facilitated by the creation of a large number of simulated environments (such as AI2-THOR, Habitat and CARLA), tasks (like point navigation, instruction following, and embodied… 

Figures and Tables from this paper

GridToPix: Training Embodied Agents with Minimal Supervision

GRIDTOPIX is proposed to train agents with terminal rewards in gridworlds that generically mirror Embodied AI environments, i.e., they are independent of the task; 2) distill the learned policy into agents that reside in complex visual worlds.

Core Challenges in Embodied Vision-Language Planning

This paper proposes a taxonomy to unify Embodied Vision-Language Planning tasks, a family of prominent embodied navigation and manipulation problems that jointly use computer vision and natural language, and presents the core challenges that new EVLP works should seek to address and advocates for task construction that enables model generalizability and furthers real-world deployment.

ProcTHOR: Large-Scale Embodied AI Using Procedural Generation

The proposed PROCTHOR, a framework for procedural generation of Embodied AI environments, enables us to sample arbitrarily large datasets of diverse, interactive, customizable, and performant virtual environments to train and evaluate embodied agents across navigation, interaction, and manipulation tasks.

A General Purpose Supervisory Signal for Embodied Agents

The Scene Graph Contrastive (SGC) loss is proposed, which uses scene graphs as general-purpose, training-only, supervisory signals, and uses contrastive learning to align an agent’s representation with a rich graphical encoding of its environment.

ManipulaTHOR: A Framework for Visual Object Manipulation

This work proposes a framework for object manipulation built upon the physics-enabled, visually rich AI2-THOR framework and presents a new challenge to the Embodied AI community known as ArmPointNav, which extends the popular point navigation task to object manipulation and offers new challenges including 3D obstacle avoidance.

Ask4Help: Learning to Leverage an Expert for Embodied Tasks

This paper proposes the A SK 4H ELP policy, a policy that augments agents with the ability to request, and then use expert assistance, thereby reducing the cost of querying the expert.

What do navigation agents learn about their environment?

This paper introduces the Interpretability System for Embodied agEnts (iSEE) for Point Goal and Object Goal navigation agents and uses iSEE to probe the dynamic representations produced by these agents for the presence of information about the agent as well as the environment.

Visual Room Rearrangement

The experiments show that solving this challenging interactive task that involves navigation and object interaction is beyond the capabilities of the current state-of-the-art techniques for embodied tasks and the authors are still very far from achieving perfect performance on these types of tasks.

ASC me to Do Anything: Multi-task Training for Embodied AI

Atomic Skill Completion (ASC) is proposed, an approach for multi-task training for Embodied AI, where a set of atomic skills shared across multiple tasks are composed together to perform the tasks.

Towards Disturbance-Free Visual Mobile Manipulation

This paper studies the problem of training agents to complete the task of visual mobile manipulation in the ManipulaTHOR environment while avoiding unnecessary collision (disturbance) with objects, and proposes a two-stage training curriculum where an agent is first allowed to freely explore and build basic competencies without penalization.

References

SHOWING 1-10 OF 57 REFERENCES

RoboTHOR: An Open Simulation-to-Real Embodied AI Platform

RoboTHOR offers a framework of simulated environments paired with physical counterparts to systematically explore and overcome the challenges of simulation-to-real transfer, and a platform where researchers across the globe can remotely test their embodied models in the physical world.

REVERIE: Remote Embodied Visual Referring Expression in Real Indoor Environments

  • Yuankai QiQi Wu A. V. Hengel
  • Computer Science
    2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
  • 2020
A dataset of varied and complex robot tasks, described in natural language, in terms of objects visible in a large set of real images, and a novel Interactive Navigator-Pointer model is proposed that provides a strong baseline on the task.

Gibson Env: Real-World Perception for Embodied Agents

This paper investigates developing real-world perception for active agents, proposes Gibson Environment for this purpose, and showcases a set of perceptual tasks learned therein.

RLBench: The Robot Learning Benchmark & Learning Environment

This large-scale benchmark aims to accelerate progress in a number of vision-guided manipulation research areas, including: reinforcement learning, imitation learning, multi-task learning, geometric computer vision, and in particular, few-shot learning.

Vision-and-Language Navigation: Interpreting Visually-Grounded Navigation Instructions in Real Environments

This work provides the first benchmark dataset for visually-grounded natural language navigation in real buildings - the Room-to-Room (R2R) dataset and presents the Matter-port3D Simulator - a large-scale reinforcement learning environment based on real imagery.

Visual Semantic Planning Using Deep Successor Representations

This work addresses the problem of visual semantic planning: the task of predicting a sequence of actions from visual observations that transform a dynamic environment from an initial state to a goal state, and develops a deep predictive model based on successor representations.

Habitat: A Platform for Embodied AI Research

The comparison between learning and SLAM approaches from two recent works are revisited and evidence is found -- that learning outperforms SLAM if scaled to an order of magnitude more experience than previous investigations, and the first cross-dataset generalization experiments are conducted.

Two Body Problem: Collaborative Visual Task Completion

This paper studies the problem of learning to collaborate directly from pixels in AI2-THOR and demonstrates the benefits of explicit and implicit modes of communication to perform visual tasks.

IQA: Visual Question Answering in Interactive Environments

The Hierarchical Interactive Memory Network (HIMN), consisting of a factorized set of controllers, allowing the system to operate at multiple levels of temporal abstraction, is proposed, and outperforms popular single controller based methods on IQUAD V1.

SAPIEN: A SimulAted Part-Based Interactive ENvironment

  • Fanbo XiangYuzhe Qin Hao Su
  • Computer Science
    2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
  • 2020
SAPIEN is a realistic and physics-rich simulated environment that hosts a large-scale set of articulated objects that enables various robotic vision and interaction tasks that require detailed part-level understanding and hopes it will open research directions yet to be explored.
...