The Dialogue Dodecathlon: Open-Domain Knowledge and Image Grounded Conversational Agents

  title={The Dialogue Dodecathlon: Open-Domain Knowledge and Image Grounded Conversational Agents},
  author={Kurt Shuster and Da Ju and Stephen Roller and Emily Dinan and Y-Lan Boureau and Jason Weston},
We introduce dodecaDialogue: a set of 12 tasks that measures if a conversational agent can communicate engagingly with personality and empathy, ask questions, answer questions by utilizing knowledge resources, discuss topics and situations, and perceive and converse about images. By multi-tasking on such a broad large-scale set of data, we hope to both move towards and measure progress in producing a single unified agent that can perceive, reason and converse with humans in an open-domain… 

Multi-Modal Open-Domain Dialogue

This work studies incorporating different image fusion schemes and domain-adaptive pre-training and fine-tuning strategies, and shows that the best resulting model outperforms strong existing models in multi-modal dialogue while simultaneously performing as well as its predecessor (text-only) BlenderBot in text-based conversation.

All-in-One Image-Grounded Conversational Agents

This work designs an architecture that combines state-of-the-art Transformer and ResNeXt modules fed into a novel attentive multimodal module to produce a combined model trained on many tasks, and provides a thorough analysis of the components of the model, and transfer performance when training on one, some, or all of the tasks.

BlenderBot 3: a deployed conversational agent that continually learns to responsibly engage

The goal of this research program is to enable the community to study ever-improving responsible agents that learn through interaction in BlenderBot 3, a 175B parameter dialogue model capable of open-domain conversation with access to the internet and a long-term memory.

DialoGLUE: A Natural Language Understanding Benchmark for Task-Oriented Dialogue

DialoGLUE (Dialogue Language Understanding Evaluation), a public benchmark consisting of 7 task-oriented dialogue datasets covering 4 distinct natural language understanding tasks, is introduced, designed to encourage dialogue research in representation-based transfer, domain adaptation, and sample-efficient task learning.

Fusing task-oriented and open-domain dialogues in conversational agents

A new dataset, based on the popular TOD dataset MultiWOZ, is built, by rewriting the existing TOD turns and adding new ODD turns, and it features inter-mode contextual dependency, i.e., the dialogue turns from the two modes depend on each other.

FETA: A Benchmark for Few-Sample Task Transfer in Open-Domain Dialogue

Conversational task transfer in conversational AI is explored by introducing FETA: a benchmark for FE w-sample TA sk transfer in open-domain dialogue, enabling the study of intra-dataset task transfer; task transfer without domain adaptation.

FlowEval: A Consensus-Based Dialogue Evaluation Framework Using Segment Act Flows

This work develops the first consensus-based dialogue evaluation framework, FlowEval, which provides a reference-free approach for dialog evaluation by finding pseudo-references and proposes segment act, an extension of dialog act from utterance level to segment level, and crowdsource a large-scale dataset for it.

Adding Chit-Chat to Enhance Task-Oriented Dialogues

Automatic and human evaluations show that the proposed Adding Chit-Chat to ENhance Task-ORiented dialogues (ACCENTOR) models can code-switch between task and chit-chat to be more engaging, interesting, knowledgeable, and humanlike, while maintaining competitive task performance.

Answer-Me: Multi-Task Open-Vocabulary Visual Question Answering

This work proposes a novel and simple recipe to pre-train a vision-language joint model, which is multi-task as well, and observes that the proposed approach is able to generalize to unseen tasks and that more diverse mixtures lead to higher accuracy in both known and novel tasks.

MMChat: Multi-Modal Chat Dataset on Social Media

A benchmark model to address the sparsity issue in dialogue generation tasks by adapting the attention routing mechanism on image features is developed and experiments demonstrate the usefulness of incorporating image features and the effectiveness in handling theSparsity of image features.



Wizard of Wikipedia: Knowledge-Powered Conversational agents

The best performing dialogue models are able to conduct knowledgeable discussions on open-domain topics as evaluated by automatic metrics and human evaluations, while a new benchmark allows for measuring further improvements in this important research direction.


  • Computer Science
  • 2019
This work trains a goal-oriented model with reinforcement learning via selfplay against an imitation-learned “chit-chat” model with two new approaches and shows that both models outperform a strong inverse model baseline and can converse naturally with their dialogue partner in order to achieve goals.

I love your chain mail! Making knights smile in a fantasy game world: Open-domain goal-oriented dialogue agents

A goal-oriented model with reinforcement learning against an imitation-learned ``chit-chat'' model with two approaches is trained and it is shown that both models outperform an inverse model baseline and can converse naturally with their dialogue partner in order to achieve goals.

Towards Empathetic Open-domain Conversation Models: A New Benchmark and Dataset

This work proposes a new benchmark for empathetic dialogue generation and EmpatheticDialogues, a novel dataset of 25k conversations grounded in emotional situations, and presents empirical comparisons of dialogue model adaptations forEmpathetic responding, leveraging existing models or datasets without requiring lengthy re-training of the full model.

TransferTransfo: A Transfer Learning Approach for Neural Network Based Conversational Agents

A new approach to generative data-driven dialogue systems (e.g. chatbots) called TransferTransfo is introduced which is a combination of a Transfer learning based training scheme and a high-capacity Transformer model which shows strong improvements over the current state-of-the-art end-to-end conversational models.

Image-Grounded Conversations: Multimodal Context for Natural Question and Response Generation

This work presents a novel task, Image Grounded Conversations (IGC), in which natural-sounding conversations are generated about a shared image, and introduces a new multiple reference dataset of crowd-sourced, event-centric conversations on images.

Conversing by Reading: Contentful Neural Conversation with On-demand Machine Reading

A new end-to-end approach to contentful neural conversation that jointly models response generation and on-demand machine reading is presented, allowing for more focused integration of external knowledge than has been possible in prior approaches.

The Second Conversational Intelligence Challenge (ConvAI2)

To improve performance on multi-turn conversations with humans, future systems must go beyond single word metrics like perplexity to measure the performance across sequences of utterances (conversations)—in terms of repetition, consistency and balance of dialogue acts.

ACUTE-EVAL: Improved Dialogue Evaluation with Optimized Questions and Multi-turn Comparisons

A novel procedure involving comparing two full dialogues, where a human judge is asked to pay attention to only one speaker within each, and make a pairwise judgment, resulting in better tests.

Engaging Image Chat: Modeling Personality in Grounded Dialogue

This work collects a large dataset of grounded human-human conversations, where humans are asked to play the role of a given personality, as the use of personality in conversation has also been shown to be engaging.