Ask4Help: Learning to Leverage an Expert for Embodied Tasks

@article{Singh2022Ask4HelpLT,
  title={Ask4Help: Learning to Leverage an Expert for Embodied Tasks},
  author={Kunal Pratap Singh and Luca Weihs and Alvaro Herrasti and Jonghyun Choi and Aniruddha Kemhavi and Roozbeh Mottaghi},
  journal={ArXiv},
  year={2022},
  volume={abs/2211.09960}
}
Embodied AI agents continue to become more capable every year with the advent of new models, environments, and benchmarks, but are still far away from being performant and reliable enough to be deployed in real, user-facing, applications. In this paper, we ask: can we bridge this gap by enabling agents to ask for assistance from an expert such as a human being? To this end, we propose the A SK 4H ELP policy that augments agents with the ability to request, and then use expert assistance. A SK… 

Figures and Tables from this paper

References

SHOWING 1-10 OF 58 REFERENCES

AllenAct: A Framework for Embodied AI Research

AllenAct is introduced, a modular and flexible learning framework designed with a focus on the unique requirements of Embodied AI research that provides first-class support for a growing collection of embodied environments, tasks and algorithms.

Just Ask: An Interactive Learning Framework for Vision and Language Navigation

This work proposes an interactive learning framework to endow the agent with the ability to ask for users' help in ambiguous situations and designs a continual learning strategy, which can be viewed as a data augmentation method, for the agent to improve further utilizing its interaction history with a human.

Help, Anna! Visual Navigation with Natural Multimodal Assistance via Retrospective Curiosity-Encouraging Imitation Learning

“Help, Anna!” (HANNA), an interactive photo-realistic simulator in which an agent fulfills object-finding tasks by requesting and interpreting natural language-and-vision assistance, and an imitation learning algorithm that teaches the agent to avoid repeating past mistakes while simultaneously predicting its own chances of making future progress.

Auxiliary Tasks and Exploration Enable ObjectGoal Navigation

This work proposes that agents will act to simplify their visual inputs so as to smooth their RNN dynamics, and that auxiliary tasks reduce overfitting by minimizing effective RNN dimensionality; i.e. a performant ObjectNav agent that must maintain coherent plans over long horizons does so by learning smooth, low-dimensional recurrent dynamics.

Vision-and-Dialog Navigation

This work introduces Cooperative Vision-and-Dialog Navigation, a dataset of over 2k embodied, human-human dialogs situated in simulated, photorealistic home environments and establishes an initial, multi-modal sequence-to-sequence model.

Asking for Help Using Inverse Semantics

This work demonstrates an approach for enabling a robot to recover from failures by communicating its need for specific help to a human partner using natural language, and presents a novel inverse semantics algorithm for generating effective help requests.

Recovering from failure by asking for help

This work demonstrates an approach for enabling a robot to recover from failures by communicating its need for specific help to a human partner using natural language, and presents a novel inverse semantics algorithm for generating effective help requests.

Auxiliary Tasks Speed Up Learning PointGoal Navigation

This work develops a method to significantly increase sample and time efficiency in learning PointNav using self-supervised auxiliary tasks (e.g. predicting the action taken between two egocentric observations, predicting the distance between two observations from a trajectory, etc.).

THDA: Treasure Hunt Data Augmentation for Semantic Navigation

This paper shows that the key problem is overfitting in ObjectNav, and introduces Treasure Hunt Data Augmentation (THDA) to address overfitting.

IQA: Visual Question Answering in Interactive Environments

The Hierarchical Interactive Memory Network (HIMN), consisting of a factorized set of controllers, allowing the system to operate at multiple levels of temporal abstraction, is proposed, and outperforms popular single controller based methods on IQUAD V1.
...