Corpus ID: 226246150

Rearrangement: A Challenge for Embodied AI

@article{Batra2020RearrangementAC,
  title={Rearrangement: A Challenge for Embodied AI},
  author={Dhruv Batra and Angel Xuan Chang and S. Chernova and Andrew J. Davison and Jun Deng and Vladlen Koltun and Sergey Levine and Jitendra Malik and Igor Mordatch and Roozbeh Mottaghi and Manolis Savva and Hao Su},
  journal={ArXiv},
  year={2020},
  volume={abs/2011.01975}
}
We describe a framework for research and evaluation in Embodied AI. Our proposal is based on a canonical task: Rearrangement. A standard task can focus the development of new techniques and serve as a source of trained models that can be transferred to other settings. In the rearrangement task, the goal is to bring a given physical environment into a specified state. The goal state can be specified by object poses, by images, by a description in language, or by letting the agent experience the… Expand
Visual Room Rearrangement
TLDR
The experiments show that solving this challenging interactive task that involves navigation and object interaction is beyond the capabilities of the current state-of-the-art techniques for embodied tasks and the authors are still very far from achieving perfect performance on these types of tasks. Expand
NeRP: Neural Rearrangement Planning for Unknown Objects
TLDR
This work proposes NeRP (Neural Rearrangement Planning), a deep learning based approach for multi-step neural object rearrangement planning which works with never-before-seen objects, that is trained on simulation data, and generalizes to the real world. Expand
Core Challenges in Embodied Vision-Language Planning
TLDR
A taxonomy is proposed to unify Embodied Vision-Language Planning tasks, a family of prominent embodied navigation and manipulation problems that jointly use computer vision and natural language, and an in-depth analysis and comparison of the new and current algorithmic approaches, metrics, simulated environments, as well as the datasets used for EVLP tasks are presented. Expand
Embodied BERT: A Transformer Model for Embodied, Language-guided Visual Task Completion
TLDR
Embodied BERT (EmBERT) is presented, a transformer-based model which can attend to high-dimensional, multi-modal inputs across long temporal horizons for languageconditioned task completion and bridge the gap between successful objectcentric navigation models used for noninteractive agents and the language-guided visual task completion benchmark, ALFRED. Expand
The ThreeDWorld Transport Challenge: A Visually Guided Task-and-Motion Planning Benchmark for Physically Realistic Embodied AI
TLDR
This work builds this benchmark challenge using the ThreeDWorld simulation: a virtual 3D environment where all objects respond to physics, and where can be controlled using a fully physics-driven navigation and interaction API. Expand
Megaverse: Simulating Embodied Agents at One Million Experiences per Second
TLDR
The efficient design of the engine enables physics-based simulation with highdimensional egocentric observations at more than 1,000,000 actions per second on a single 8-GPU node, thereby taking full advantage of the massive parallelism of modern GPUs. Expand
Learning to Explore, Navigate and Interact for Visual Room Rearrangement
Intelligent agents for visual room rearrangement aim to reach a goal room configuration from a cluttered room configuration via a sequence of interactions. For successful visual room rearrangement,Expand
ManipulaTHOR: A Framework for Visual Object Manipulation
TLDR
This work proposes a framework for object manipulation built upon the physics-enabled, visually rich AI2-THOR framework and presents a new challenge to the Embodied AI community known as ArmPointNav, which extends the popular point navigation task to object manipulation and offers new challenges including 3D obstacle avoidance. Expand
Shaping embodied agent behavior with activity-context priors from egocentric video
TLDR
This work introduces an approach to discover activitycontext priors from in-the-wild egocentric video captured with human worn cameras, encoding the video-based prior as an auxiliary reward function that encourages an agent to bring compatible objects together before attempting an interaction. Expand
Evaluating model-based planning and planner amortization for continuous control
TLDR
It is found that well-tuned model-free agents are strong baselines even for high DoF control problems but MPC with learned proposals and models can significantly improve performance and data efficiency in hard multi-task/multi-goal settings. Expand
...
1
2
3
4
...

References

SHOWING 1-10 OF 84 REFERENCES
RLBench: The Robot Learning Benchmark & Learning Environment
TLDR
This large-scale benchmark aims to accelerate progress in a number of vision-guided manipulation research areas, including: reinforcement learning, imitation learning, multi-task learning, geometric computer vision, and in particular, few-shot learning. Expand
On Evaluation of Embodied Navigation Agents
TLDR
The present document summarizes the consensus recommendations of a working group to study empirical methodology in navigation research and discusses different problem statements and the role of generalization, present evaluation measures, and provides standard scenarios that can be used for benchmarking. Expand
SAPIEN: A SimulAted Part-Based Interactive ENvironment
  • Fanbo Xiang, Yuzhe Qin, +11 authors Hao Su
  • Computer Science
  • 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
  • 2020
TLDR
SAPIEN is a realistic and physics-rich simulated environment that hosts a large-scale set of articulated objects that enables various robotic vision and interaction tasks that require detailed part-level understanding and hopes it will open research directions yet to be explored. Expand
The Arcade Learning Environment: An Evaluation Platform for General Agents (Extended Abstract)
TLDR
The promise of ALE is illustrated by developing and benchmarking domain-independent agents designed using well-established AI techniques for both reinforcement learning and planning, and an evaluation methodology made possible by ALE is proposed. Expand
Habitat: A Platform for Embodied AI Research
TLDR
The comparison between learning and SLAM approaches from two recent works are revisited and evidence is found -- that learning outperforms SLAM if scaled to an order of magnitude more experience than previous investigations, and the first cross-dataset generalization experiments are conducted. Expand
Gibson Env: Real-World Perception for Embodied Agents
TLDR
This paper investigates developing real-world perception for active agents, proposes Gibson Environment for this purpose, and showcases a set of perceptual tasks learned therein. Expand
AI2-THOR: An Interactive 3D Environment for Visual AI
TLDR
AI2-THOR consists of near photo-realistic 3D indoor scenes, where AI agents can navigate in the scenes and interact with objects to perform tasks and facilitate building visually intelligent models. Expand
A Concise Introduction to Models and Methods for Automated Planning
TLDR
The goal is to provide a modern and coherent view of planning that is precise, concise, and mostly self-contained, without being shallow. Expand
MINOS: Multimodal Indoor Simulator for Navigation in Complex Environments
TLDR
MINOS is used to benchmark deep-learning-based navigation methods, to analyze the influence of environmental complexity on navigation performance, and to carry out a controlled study of multimodality in sensorimotor learning. Expand
The RoboCup Synthetic Agent Challenge 97
TLDR
This paper presents three specific challenges for the next two years of RoboCup Challenge: learning of individual agents and teams; multi-agent team planning and plan-execution in service of teamwork; and opponent modeling. Expand
...
1
2
3
4
5
...