Wizard of Wikipedia: Knowledge-Powered Conversational agents
- Emily Dinan, Stephen Roller, Kurt Shuster, Angela Fan, Michael Auli, J. Weston
- Computer ScienceInternational Conference on Learning…
- 27 September 2018
The best performing dialogue models are able to conduct knowledgeable discussions on open-domain topics as evaluated by automatic metrics and human evaluations, while a new benchmark allows for measuring further improvements in this important research direction.
Recipes for Building an Open-Domain Chatbot
- Stephen Roller, Emily Dinan, J. Weston
- Computer ScienceConference of the European Chapter of the…
- 28 April 2020
Human evaluations show the best models outperform existing approaches in multi-turn dialogue on engagingness and humanness measurements, and the limitations of this work are discussed by analyzing failure cases of the models.
OPT: Open Pre-trained Transformer Language Models
- Susan Zhang, Stephen Roller, Luke Zettlemoyer
- Computer ScienceArXiv
- 2 May 2022
This work presents Open Pre-trained Transformers (OPT), a suite of decoder-only pre-trained transformers ranging from 125M to 175B parameters, which they aim to fully and responsibly share with interested researchers.
Poly-encoders: Architectures and Pre-training Strategies for Fast and Accurate Multi-sentence Scoring
- Samuel Humeau, Kurt Shuster, Marie-Anne Lachaux, J. Weston
- Computer ScienceInternational Conference on Learning…
- 30 April 2020
This work develops a new transformer architecture, the Poly-encoder, that learns global rather than token level self-attention features, and shows that the models achieve state-of-the-art results on four tasks.
The Second Conversational Intelligence Challenge (ConvAI2)
- Emily Dinan, V. Logacheva, J. Weston
- Computer ScienceThe NeurIPS '18 Competition
- 31 January 2019
To improve performance on multi-turn conversations with humans, future systems must go beyond single word metrics like perplexity to measure the performance across sequences of utterances (conversations)—in terms of repetition, consistency and balance of dialogue acts.
Poly-encoders: Transformer Architectures and Pre-training Strategies for Fast and Accurate Multi-sentence Scoring
- Samuel Humeau, Kurt Shuster, Marie-Anne Lachaux, J. Weston
- Computer Science
- 22 April 2019
This work develops a new transformer architecture, the Poly-encoder, that learns global rather than token level self-attention features and achieves state-of-the-art results on three existing tasks.
Internet-Augmented Dialogue Generation
- M. Komeili, Kurt Shuster, J. Weston
- Computer ScienceAnnual Meeting of the Association for…
- 15 July 2021
An approach that learns to generate an internet search query based on the context, and then conditions on the search results to finally generate a response, a method that can employ up-to-the-minute relevant information.
Retrieval Augmentation Reduces Hallucination in Conversation
- Kurt Shuster, Spencer Poff, Moya Chen, Douwe Kiela, J. Weston
- Computer ScienceConference on Empirical Methods in Natural…
- 15 April 2021
This work explores the use of neural-retrieval-in-the-loop architectures recently shown to be effective in open-domain QA for knowledge-grounded dialogue, a task that is arguably more challenging as it requires querying based on complex multi-turn dialogue context and generating conversationally coherent responses.
Image-Chat: Engaging Grounded Conversations
- Kurt Shuster, Samuel Humeau, Antoine Bordes, J. Weston
- Computer ScienceAnnual Meeting of the Association for…
- 2 November 2018
Automatic metrics and human evaluations of engagingness show the efficacy of this approach, and state-of-the-art performance on the existing IGC task is obtained, and the best performing model is almost on par with humans on the Image-Chat test set.
Can You Put it All Together: Evaluating Conversational Agents’ Ability to Blend Skills
- Eric Michael Smith, Mary Williamson, Kurt Shuster, J. Weston, Y-Lan Boureau
- Computer ScienceAnnual Meeting of the Association for…
- 17 April 2020
This work investigates several ways to combine models trained towards isolated capabilities, ranging from simple model aggregation schemes that require minimal additional training, to various forms of multi-task training that encompass several skills at all training stages.
...
...