• Corpus ID: 236957374

BEHAVIOR: Benchmark for Everyday Household Activities in Virtual, Interactive, and Ecological Environments

@article{Srivastava2021BEHAVIORBF,
  title={BEHAVIOR: Benchmark for Everyday Household Activities in Virtual, Interactive, and Ecological Environments},
  author={Sanjana Srivastava and Chengshu Li and Michael Lingelbach and Roberto Mart'in-Mart'in and Fei Xia and Kent Vainio and Zheng Lian and Cem Gokmen and S. Buch and C. Karen Liu and Silvio Savarese and Hyowon Gweon and Jiajun Wu and Li Fei-Fei},
  journal={ArXiv},
  year={2021},
  volume={abs/2108.03332}
}
| , Psychology } Institute for Human-Centered AI (HAI) F Stanford University Abstract: We introduce BEHAVIOR, a benchmark for embodied AI with 100 activities in simulation, spanning a range of everyday household chores such as cleaning, maintenance, and food preparation. These activities are designed to be realistic, diverse and complex, aiming to reproduce the challenges that agents must face in the real world. Building such a benchmark poses three fundamental difficulties for each activity: de… 
BEHAVIOR in Habitat 2.0: Simulator-Independent Logical Task Description for Benchmarking Embodied AI Agents
TLDR
This work brings 45 out of the 100 BEHAVIOR activities which involve only kinematic states into H2.0 to benefit from its fast simulation speed as a first step towards demonstrating the ease of adapting activities defined in the logic space into different simulators, in the process equip H1.0 with a even richer set of iG 2.0 interactive scenes and assets.
Housekeep: Tidying Virtual Households using Commonsense Reasoning
TLDR
A modular baseline approach for Housekeep is proposed that lever-ages a fine-tuned large language model (LLM) trained on an internet text corpus for effective planning and shows that the baseline agent generalizes to rearranging unseen objects in unknown environments.
MineDojo: Building Open-Ended Embodied Agents with Internet-Scale Knowledge
TLDR
This work introduces MINEDOJO, a new framework built on the popular Minecraft game that features a simulation suite with thousands of diverse open-ended tasks and an internet-scale knowledge base with Minecraft videos, tutorials, wiki pages, and forum discussions, and proposes a novel agent learning algorithm that leverages large pre-trained video-language models as a learned reward function.
VLMbench: A Compositional Benchmark for Vision-and-Language Manipulation
TLDR
This work introduces an Automatic Manipulation Solver simulator and builds a Vision-and-Language Manipulation benchmark, containing various language instructions on categorized robotic manipulation tasks, and develops a keypoint-based model 6D-CLIPort to deal with multi-view observations and language input and output a sequence of 6 degrees of freedom (DoF) actions.
ProcTHOR: Large-Scale Embodied AI Using Procedural Generation
TLDR
The proposed PROCTHOR, a framework for procedural generation of Embodied AI environments, enables us to sample arbitrarily large datasets of diverse, interactive, customizable, and performant virtual environments to train and evaluate embodied agents across navigation, interaction, and manipulation tasks.
RoboTube: Learning Household Manipulation from Human Videos with Simulated Twin Environments
  • Computer Science
  • 2022
TLDR
RoboTube, a human video dataset, and its digital twins for learning various robotic manipulation tasks are presented and it is hoped RoboTube can lower the barrier to robotics research for beginners while facilitating reproducible research in the community.
Finding Fallen Objects Via Asynchronous Audio-Visual Integration
TLDR
A set of embodied agent baselines are developed, based on imitation learning, reinforcement learning, and modular planning, and an in-depth analysis of the challenge of this new task of multi-modal object localization in 3D virtual environments.
N$^2$M$^2$: Learning Navigation for Arbitrary Mobile Manipulation Motions in Unseen and Dynamic Environments
TLDR
Neural Navigation for Mobile Manipulation is introduced which extends this decomposition to complex obstacle environments and enables it to tackle a broad range of tasks in real world settings and provides a simple way to define new mobile manipulation tasks.
Generalizable Task Planning through Representation Pretraining
TLDR
This letter proposes a learning-to-plan method that can generalize to new object instances by leveraging object-level representations extracted from a synthetic scene understanding dataset and shows that the model achieves measurably better success rate than state-of-the-art end- to-end approaches.
Broadly-Exploring, Local-Policy Trees for Long-Horizon Task Planning
TLDR
Broadly-Exploring, Local-policy Trees (BELT), merges these two approaches to leverage the strengths of both through a task-conditioned, model-based tree search, demonstrated experimentally to be able to plan long-horizon, sequential trajectories with a goal conditioned policy and generate plans that are robust.
...
...

References

SHOWING 1-10 OF 135 REFERENCES
VirtualHome: Simulating Household Activities Via Programs
TLDR
This paper crowd-source programs for a variety of activities that happen in people's homes, via a game-like interface used for teaching kids how to code, and implements the most common atomic actions in the Unity3D game engine, and uses them to "drive" an artificial agent to execute tasks in a simulated household environment.
iGibson 2.0: Object-Centric Simulation for Robot Learning of Everyday Household Tasks
TLDR
The new capabilities of iGibson 2.0 are evaluated to enable robot learning of novel tasks, in the hope of demonstrating the potential of this new simulator to support new research in embodied AI.
ALFRED: A Benchmark for Interpreting Grounded Instructions for Everyday Tasks
TLDR
It is shown that a baseline model based on recent embodied vision-and-language tasks performs poorly on ALFRED, suggesting that there is significant room for developing innovative grounded visual language understanding models with this benchmark.
Habitat: A Platform for Embodied AI Research
TLDR
The comparison between learning and SLAM approaches from two recent works are revisited and evidence is found -- that learning outperforms SLAM if scaled to an order of magnitude more experience than previous investigations, and the first cross-dataset generalization experiments are conducted.
Gibson Env: Real-World Perception for Embodied Agents
TLDR
This paper investigates developing real-world perception for active agents, proposes Gibson Environment for this purpose, and showcases a set of perceptual tasks learned therein.
Rearrangement: A Challenge for Embodied AI
TLDR
A framework for research and evaluation in Embodied AI is described, based on a canonical task: Rearrangement, that can focus the development of new techniques and serve as a source of trained models that can be transferred to other settings.
Visual Room Rearrangement
TLDR
The experiments show that solving this challenging interactive task that involves navigation and object interaction is beyond the capabilities of the current state-of-the-art techniques for embodied tasks and the authors are still very far from achieving perfect performance on these types of tasks.
Interactive Gibson Benchmark: A Benchmark for Interactive Navigation in Cluttered Environments
TLDR
This work presents the first comprehensive benchmark for training and evaluating Interactive Navigation solutions, and presents and evaluates multiple learning-based baselines in Interactive Gibson Benchmark, and provides insights into regimes of navigation with different trade-offs between navigation, path efficiency and disturbance of surrounding objects.
The ThreeDWorld Transport Challenge: A Visually Guided Task-and-Motion Planning Benchmark for Physically Realistic Embodied AI
TLDR
This work builds this benchmark challenge using the ThreeDWorld simulation: a virtual 3D environment where all objects respond to physics, and a robot agent can be controlled using a fully physics-driven navigation and interaction API.
Vision-and-Language Navigation: Interpreting Visually-Grounded Navigation Instructions in Real Environments
TLDR
This work provides the first benchmark dataset for visually-grounded natural language navigation in real buildings - the Room-to-Room (R2R) dataset and presents the Matter-port3D Simulator - a large-scale reinforcement learning environment based on real imagery.
...
...