AllenAct: A Framework for Embodied AI Research
@article{Weihs2020AllenActAF, title={AllenAct: A Framework for Embodied AI Research}, author={Luca Weihs and Jordi Salvador and Klemen Kotar and Unnat Jain and Kuo-Hao Zeng and Roozbeh Mottaghi and Aniruddha Kembhavi}, journal={ArXiv}, year={2020}, volume={abs/2008.12760} }
The domain of Embodied AI, in which agents learn to complete tasks through interaction with their environment from egocentric observations, has experienced substantial growth with the advent of deep reinforcement learning and increased interest from the computer vision, NLP, and robotics communities. This growth has been facilitated by the creation of a large number of simulated environments (such as AI2-THOR, Habitat and CARLA), tasks (like point navigation, instruction following, and embodied…
41 Citations
GridToPix: Training Embodied Agents with Minimal Supervision
- Computer Science2021 IEEE/CVF International Conference on Computer Vision (ICCV)
- 2021
GRIDTOPIX is proposed to train agents with terminal rewards in gridworlds that generically mirror Embodied AI environments, i.e., they are independent of the task; 2) distill the learned policy into agents that reside in complex visual worlds.
Core Challenges in Embodied Vision-Language Planning
- Computer ScienceJ. Artif. Intell. Res.
- 2022
This paper proposes a taxonomy to unify Embodied Vision-Language Planning tasks, a family of prominent embodied navigation and manipulation problems that jointly use computer vision and natural language, and presents the core challenges that new EVLP works should seek to address and advocates for task construction that enables model generalizability and furthers real-world deployment.
ProcTHOR: Large-Scale Embodied AI Using Procedural Generation
- Computer ScienceArXiv
- 2022
The proposed PROCTHOR, a framework for procedural generation of Embodied AI environments, enables us to sample arbitrarily large datasets of diverse, interactive, customizable, and performant virtual environments to train and evaluate embodied agents across navigation, interaction, and manipulation tasks.
A General Purpose Supervisory Signal for Embodied Agents
- Computer ScienceArXiv
- 2022
The Scene Graph Contrastive (SGC) loss is proposed, which uses scene graphs as general-purpose, training-only, supervisory signals, and uses contrastive learning to align an agent’s representation with a rich graphical encoding of its environment.
ManipulaTHOR: A Framework for Visual Object Manipulation
- Computer Science2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
- 2021
This work proposes a framework for object manipulation built upon the physics-enabled, visually rich AI2-THOR framework and presents a new challenge to the Embodied AI community known as ArmPointNav, which extends the popular point navigation task to object manipulation and offers new challenges including 3D obstacle avoidance.
Ask4Help: Learning to Leverage an Expert for Embodied Tasks
- Computer ScienceArXiv
- 2022
This paper proposes the A SK 4H ELP policy, a policy that augments agents with the ability to request, and then use expert assistance, thereby reducing the cost of querying the expert.
What do navigation agents learn about their environment?
- Computer Science2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
- 2022
This paper introduces the Interpretability System for Embodied agEnts (iSEE) for Point Goal and Object Goal navigation agents and uses iSEE to probe the dynamic representations produced by these agents for the presence of information about the agent as well as the environment.
Visual Room Rearrangement
- Computer Science2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
- 2021
The experiments show that solving this challenging interactive task that involves navigation and object interaction is beyond the capabilities of the current state-of-the-art techniques for embodied tasks and the authors are still very far from achieving perfect performance on these types of tasks.
ASC me to Do Anything: Multi-task Training for Embodied AI
- Computer ScienceArXiv
- 2022
Atomic Skill Completion (ASC) is proposed, an approach for multi-task training for Embodied AI, where a set of atomic skills shared across multiple tasks are composed together to perform the tasks.
Towards Disturbance-Free Visual Mobile Manipulation
- Computer ScienceArXiv
- 2021
This paper studies the problem of training agents to complete the task of visual mobile manipulation in the ManipulaTHOR environment while avoiding unnecessary collision (disturbance) with objects, and proposes a two-stage training curriculum where an agent is first allowed to freely explore and build basic competencies without penalization.
References
SHOWING 1-10 OF 57 REFERENCES
RoboTHOR: An Open Simulation-to-Real Embodied AI Platform
- Computer Science2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
- 2020
RoboTHOR offers a framework of simulated environments paired with physical counterparts to systematically explore and overcome the challenges of simulation-to-real transfer, and a platform where researchers across the globe can remotely test their embodied models in the physical world.
REVERIE: Remote Embodied Visual Referring Expression in Real Indoor Environments
- Computer Science2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
- 2020
A dataset of varied and complex robot tasks, described in natural language, in terms of objects visible in a large set of real images, and a novel Interactive Navigator-Pointer model is proposed that provides a strong baseline on the task.
Gibson Env: Real-World Perception for Embodied Agents
- Computer Science2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition
- 2018
This paper investigates developing real-world perception for active agents, proposes Gibson Environment for this purpose, and showcases a set of perceptual tasks learned therein.
RLBench: The Robot Learning Benchmark & Learning Environment
- Computer ScienceIEEE Robotics and Automation Letters
- 2020
This large-scale benchmark aims to accelerate progress in a number of vision-guided manipulation research areas, including: reinforcement learning, imitation learning, multi-task learning, geometric computer vision, and in particular, few-shot learning.
Vision-and-Language Navigation: Interpreting Visually-Grounded Navigation Instructions in Real Environments
- Computer Science2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition
- 2018
This work provides the first benchmark dataset for visually-grounded natural language navigation in real buildings - the Room-to-Room (R2R) dataset and presents the Matter-port3D Simulator - a large-scale reinforcement learning environment based on real imagery.
Visual Semantic Planning Using Deep Successor Representations
- Computer Science2017 IEEE International Conference on Computer Vision (ICCV)
- 2017
This work addresses the problem of visual semantic planning: the task of predicting a sequence of actions from visual observations that transform a dynamic environment from an initial state to a goal state, and develops a deep predictive model based on successor representations.
Habitat: A Platform for Embodied AI Research
- Computer Science2019 IEEE/CVF International Conference on Computer Vision (ICCV)
- 2019
The comparison between learning and SLAM approaches from two recent works are revisited and evidence is found -- that learning outperforms SLAM if scaled to an order of magnitude more experience than previous investigations, and the first cross-dataset generalization experiments are conducted.
Two Body Problem: Collaborative Visual Task Completion
- Computer Science, Art2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
- 2019
This paper studies the problem of learning to collaborate directly from pixels in AI2-THOR and demonstrates the benefits of explicit and implicit modes of communication to perform visual tasks.
IQA: Visual Question Answering in Interactive Environments
- Computer Science2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition
- 2018
The Hierarchical Interactive Memory Network (HIMN), consisting of a factorized set of controllers, allowing the system to operate at multiple levels of temporal abstraction, is proposed, and outperforms popular single controller based methods on IQUAD V1.
SAPIEN: A SimulAted Part-Based Interactive ENvironment
- Computer Science2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
- 2020
SAPIEN is a realistic and physics-rich simulated environment that hosts a large-scale set of articulated objects that enables various robotic vision and interaction tasks that require detailed part-level understanding and hopes it will open research directions yet to be explored.