TEACh: Task-driven Embodied Agents that Chat

  title={TEACh: Task-driven Embodied Agents that Chat},
  author={Aishwarya Padmakumar and Jesse Thomason and Ayush Shrivastava and P. Lange and Anjali Narayan-Chen and Spandana Gella and Robinson Piramithu and Gokhan Tur and Dilek Z. Hakkani-T{\"u}r},
  booktitle={AAAI Conference on Artificial Intelligence},
Robots operating in human spaces must be able to engage in natural language interaction, both understanding and executing instructions, and using conversation to resolve ambiguity and correct mistakes. To study this, we introduce TEACh, a dataset of over 3,000 human-human, interactive dialogues to complete household tasks in simulation. A Commander with access to oracle information about a task communicates in natural language with a Follower. The Follower navigates through and interacts with… 

DialFRED: Dialogue-Enabled Agents for Embodied Instruction Following

This work presents DialFRED, a dialogue-enabled embodied instruction following benchmark based on the ALFRED benchmark, and proposes a questioner-performer framework wherein the questioner is pre-trained with the human-annotated data and fine-tuned with reinforcement learning.

Dialog Acts for Task Driven Embodied Agents

This work proposes a set of dialog acts for modelling such dialogs and annotates the TEACh dataset that includes over 3,000 situated, task oriented conversations with dialog acts and demonstrates the use of this annotated dataset in training models for tagging the dialog acts of a given utterance.

Don't Copy the Teacher: Data and Model Challenges in Embodied Dialogue

It is argued that imitation learning and related low-level metrics are actu-ally misleading and do not align with the goals of embodied dialogue research and may hinder progress, and evaluation should focus on higher-level semantic goals.

Language Guided Meta-Control for Embodied Instruction Following

This work proposes a generalised Language Guided Meta-Controller (LMC) for better language grounding in the large action space of the embodied agent and proposes an auxiliary reasoning loss to improve the ‘conceptual grounding’ of the agent.

ACT-Thor: A Controlled Benchmark for Embodied Action Understanding in Simulated Environments

This work uses the AI2-THOR simulated environment to produce a controlled setup in which an agent has to determine what the correct after-image is among a set of possible candidates, and suggests that only models that have a very structured representation of the actions together with powerful visual features can perform well on the task.

DOROTHIE: Spoken Dialogue for Handling Unexpected Situations in Interactive Autonomous Driving Agents

Dialogue On the ROad To Handle Irregular Events ( DOROTHIE) is introduced, a novel interactive simulation platform that en-ables the creation of unexpected situations on the basis of end-to-end models to support empirical studies on situated communication with autonomous driving agents.

DANLI: Deliberative Agent for Following Natural Language Instructions

A neuro-symbolic deliberative agent that, while following language instructions, proactively applies reasoning and planning based on its neural and symbolic representations acquired from past expe-rience (e.g., natural language and egocentric vision).

JARVIS: A Neuro-Symbolic Commonsense Reasoning Framework for Conversational Embodied Agents

JARVIS, a neuro-symbolic commonsense reasoning framework for modular, generalizable, and interpretable conversational embodied agents, is proposed, which achieves state-of-the-art (SOTA) results on all three dialogbased embodied tasks.

Incorporating External Knowledge Reasoning for Vision-and-Language Navigation with Assistant’s Help

An Attention-based Knowledge-enabled Cross-modality Reasoning with Assistant’s Help model is designed to address the unique challenges of this task and demonstrates the effectiveness of the method compared with other baselines.

A Framework for Learning to Request Rich and Contextually Useful Information from Humans

A general interactive framework is presented that enables an agent to request and interpret rich, contextually useful information from an assistant that has knowledge about the task and the environment and demonstrates the practicality of the framework on a simulated human-assisted navigation problem.



Collaborative Dialogue in Minecraft

A Minecraft-based collaborative building task in which one player is shown a target structure and needs to instruct the other player to build this structure, and the subtask of Architect utterance generation is considered, and how challenging it is is considered.

Jointly Improving Parsing and Perception for Natural Language Commands through Human-Robot Dialog

Methods for using human-robot dialog to improve language understanding for a mobile robot agent that parses natural language to underlying semantic meanings and uses robotic sensors to create multi-modal models of perceptual concepts like red and heavy are presented.

RMM: A Recursive Mental Model for Dialog Navigation

This paper introduces a two-agent task where one agent navigates and asks questions that a second, guiding agent answers, and proposes the Recursive Mental Model (RMM), a model that enables better generalization to novel environments.

Learning to Parse Natural Language Commands to a Robot Control System

This work discusses the problem of parsing natural language commands to actions and control structures that can be readily implemented in a robot execution system, and learns a parser based on example pairs of English commands and corresponding control language expressions.

Vision-and-Dialog Navigation

This work introduces Cooperative Vision-and-Dialog Navigation, a dataset of over 2k embodied, human-human dialogs situated in simulated, photorealistic home environments and establishes an initial, multi-modal sequence-to-sequence model.

Asking for Help Using Inverse Semantics

This work demonstrates an approach for enabling a robot to recover from failures by communicating its need for specific help to a human partner using natural language, and presents a novel inverse semantics algorithm for generating effective help requests.

Executing Instructions in Situated Collaborative Interactions

This work introduces a learning approach focused on recovery from cascading errors between instructions, and modeling methods to explicitly reason about instructions with multiple goals, and observes how users adapt to the system abilities.

Speaker-Follower Models for Vision-and-Language Navigation

Experiments show that all three components of this approach---speaker-driven data augmentation, pragmatic reasoning and panoramic action space---dramatically improve the performance of a baseline instruction follower, more than doubling the success rate over the best existing approach on a standard benchmark.

Learning to Interpret Natural Language Navigation Instructions from Observations

A system that learns to transform natural-language navigation instructions into executable formal plans by using a learned lexicon to refine inferred plans and a supervised learner to induce a semantic parser.

Just Ask: An Interactive Learning Framework for Vision and Language Navigation

This work proposes an interactive learning framework to endow the agent with the ability to ask for users' help in ambiguous situations and designs a continual learning strategy, which can be viewed as a data augmentation method, for the agent to improve further utilizing its interaction history with a human.