Corpus ID: 221370597

AllenAct: A Framework for Embodied AI Research

@article{Weihs2020AllenActAF,
  title={AllenAct: A Framework for Embodied AI Research},
  author={Luca Weihs and Jordi Salvador and Klemen Kotar and Unnat Jain and Kuo-Hao Zeng and R. Mottaghi and Aniruddha Kembhavi},
  journal={ArXiv},
  year={2020},
  volume={abs/2008.12760}
}
The domain of Embodied AI, in which agents learn to complete tasks through interaction with their environment from egocentric observations, has experienced substantial growth with the advent of deep reinforcement learning and increased interest from the computer vision, NLP, and robotics communities. This growth has been facilitated by the creation of a large number of simulated environments (such as AI2-THOR, Habitat and CARLA), tasks (like point navigation, instruction following, and embodied… Expand

Figures and Tables from this paper

Core Challenges in Embodied Vision-Language Planning
TLDR
A taxonomy is proposed to unify Embodied Vision-Language Planning tasks, a family of prominent embodied navigation and manipulation problems that jointly use computer vision and natural language, and an in-depth analysis and comparison of the new and current algorithmic approaches, metrics, simulated environments, as well as the datasets used for EVLP tasks are presented. Expand
ManipulaTHOR: A Framework for Visual Object Manipulation
TLDR
This work proposes a framework for object manipulation built upon the physics-enabled, visually rich AI2-THOR framework and presents a new challenge to the Embodied AI community known as ArmPointNav, which extends the popular point navigation task to object manipulation and offers new challenges including 3D obstacle avoidance. Expand
Visual Room Rearrangement
TLDR
The experiments show that solving this challenging interactive task that involves navigation and object interaction is beyond the capabilities of the current state-of-the-art techniques for embodied tasks and the authors are still very far from achieving perfect performance on these types of tasks. Expand
Communicative Learning with Natural Gestures for Embodied Navigation Agents with Human-in-the-Scene
TLDR
It is demonstrated that human gesture cues, even without predefined semantics, improve the object-goal navigation for an embodied agent, outperforming various state-of the-art methods. Expand
VisualHints: A Visual-Lingual Environment for Multimodal Reinforcement Learning
TLDR
This work introduces an extension of the TextWorld cooking environment with the addition of visual clues interspersed throughout the environment to force an RL agent to use both text and visual features to predict natural language action commands for solving the final task of cooking a meal. Expand
DASH: Modularized Human Manipulation Simulation with Vision and Language for Embodied AI
Creating virtual humans with embodied, human-like perceptual and actuation constraints has the promise to provide an integrated simulation platform for many scientific and engineering applications.Expand
A Cordial Sync: Going Beyond Marginal Policies for Multi-Agent Embodied Tasks
TLDR
The novel task FurnMove is introduced, in which agents work together to move a piece of furniture through a living room to a goal, and SYNC-policies (synchronize your actions coherently) and CORDIAL (coordination loss) are introduced. Expand
Evaluating Continual Learning Algorithms by Generating 3D Virtual Environments
Continual learning refers to the ability of humans and animals to incrementally learn over time in a given environment. Trying to simulate this learning process in machines is a challenging task,Expand
MultiON: Benchmarking Semantic Map Memory using Multi-Object Navigation
TLDR
This work proposes the multiON task, which requires navigation to an episode-specific sequence of objects in a realistic environment and generalizes the ObjectGoal navigation task and explicitly tests the ability of navigation agents to locate previously observed goal objects. Expand
RobustNav: Towards Benchmarking Robustness in Embodied Navigation
TLDR
It is found that some standard embodied navigation agents significantly underperform in the presence of visual or dynamics corruptions, and standard techniques to improve robustness such as data-augmentation and self-supervised adaptation offer some zero-shot resistance and improvements in navigation performance. Expand
...
1
2
3
...

References

SHOWING 1-10 OF 63 REFERENCES
RoboTHOR: An Open Simulation-to-Real Embodied AI Platform
TLDR
RoboTHOR offers a framework of simulated environments paired with physical counterparts to systematically explore and overcome the challenges of simulation-to-real transfer, and a platform where researchers across the globe can remotely test their embodied models in the physical world. Expand
REVERIE: Remote Embodied Visual Referring Expression in Real Indoor Environments
  • Yuankai Qi, Qi Wu, +4 authors A. V. Hengel
  • Computer Science
  • 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
  • 2020
TLDR
A dataset of varied and complex robot tasks, described in natural language, in terms of objects visible in a large set of real images, and a novel Interactive Navigator-Pointer model is proposed that provides a strong baseline on the task. Expand
Embodied Question Answering
TLDR
A new AI task where an agent is spawned at a random location in a 3D environment and asked a question ('What color is the car?'), and the agent must first intelligently navigate to explore the environment, gather necessary visual information through first-person (egocentric) vision, and then answer the question. Expand
Gibson Env: Real-World Perception for Embodied Agents
TLDR
This paper investigates developing real-world perception for active agents, proposes Gibson Environment for this purpose, and showcases a set of perceptual tasks learned therein. Expand
RLBench: The Robot Learning Benchmark & Learning Environment
TLDR
This large-scale benchmark aims to accelerate progress in a number of vision-guided manipulation research areas, including: reinforcement learning, imitation learning, multi-task learning, geometric computer vision, and in particular, few-shot learning. Expand
Vision-and-Language Navigation: Interpreting Visually-Grounded Navigation Instructions in Real Environments
TLDR
This work provides the first benchmark dataset for visually-grounded natural language navigation in real buildings - the Room-to-Room (R2R) dataset and presents the Matter-port3D Simulator - a large-scale reinforcement learning environment based on real imagery. Expand
Visual Semantic Planning Using Deep Successor Representations
TLDR
This work addresses the problem of visual semantic planning: the task of predicting a sequence of actions from visual observations that transform a dynamic environment from an initial state to a goal state, and develops a deep predictive model based on successor representations. Expand
Habitat: A Platform for Embodied AI Research
TLDR
The comparison between learning and SLAM approaches from two recent works are revisited and evidence is found -- that learning outperforms SLAM if scaled to an order of magnitude more experience than previous investigations, and the first cross-dataset generalization experiments are conducted. Expand
Two Body Problem: Collaborative Visual Task Completion
TLDR
This paper studies the problem of learning to collaborate directly from pixels in AI2-THOR and demonstrates the benefits of explicit and implicit modes of communication to perform visual tasks. Expand
ViZDoom: A Doom-based AI research platform for visual reinforcement learning
TLDR
A novel test-bed platform for reinforcement learning research from raw visual information which employs the first-person perspective in a semi-realistic 3D world and confirms the utility of ViZDoom as an AI research platform and implies that visual reinforcement learning in 3D realistic first- person perspective environments is feasible. Expand
...
1
2
3
4
5
...