I like fish, especially dolphins: Addressing Contradictions in Dialogue Modeling

  title={I like fish, especially dolphins: Addressing Contradictions in Dialogue Modeling},
  author={Yixin Nie and Mary Williamson and Mohit Bansal and Douwe Kiela and Jason Weston},
To quantify how well natural language understanding models can capture consistency in a general conversation, we introduce the DialoguE COntradiction DEtection task (DECODE) and a new conversational dataset containing both human-human and human-bot contradictory dialogues. We show that: (i) our newly collected dataset is notably more effective at providing supervision for the dialogue contradiction detection task than existing NLI data including those aimed to cover the dialogue domain; (ii… 

Figures and Tables from this paper

CDConv: A Benchmark for Contradiction Detection in Chinese Conversations

This work designs a series of methods for automatic conversation generation, which simulate common user behaviors that trigger chatbots to make contradictions, and conducts careful manual quality screening of the constructed conversations, showing that state-of-the-art Chinese chatbots can be easily goaded into making contradictions.

Improving Bot Response Contradiction Detection via Utterance Rewriting

This work aims to improve the contradiction detection via rewriting all bot utterances to restore co-references and ellipsis, and empirically demonstrates that this model can produce satisfactory rewrites to makeBot utterances more complete.

DynaEval: Unifying Turn and Dialogue Level Evaluation

DynaEval, a unified automatic evaluation framework which is not only capable of performing turn-level evaluation, but also holistically considers the quality of the entire dialogue, is proposed.

Commonsense-Focused Dialogues for Response Generation: An Empirical Study

This paper auto-extract commonsensical dialogues from existing dialogue datasets by leveraging ConceptNet, a commonsense knowledge graph, and proposes an approach for automatic evaluation of commonsense that relies on features derived from ConceptNet and pre-trained language and dialog models, and shows reasonable correlation with human evaluation of responses’ commonsense quality.

FaithDial: A Faithful Benchmark for Information-Seeking Dialogue

This work creates F AITH D IAL, a new benchmark for hallucination-free dialogues, by editing hallucinated responses in the Wizard of Wikipedia (W O W) benchmark, and benchmark a series of state-of-the-art models and proposes an auxiliary contrastive objective that achieves the highest level of faithfulness and abstractiveness.

Don’t be Contradicted with Anything! CI-ToD: Towards Benchmarking Consistency for Task-oriented Dialogue System

This paper introduces CI-ToD, a novel dataset for Consistency Identification in Task-oriented Dialog system, and annotates the single label to enable the model to judge whether the system response is contradictory, but also provides more fine-grained labels to encourage model to know what inconsistent sources lead to it.

Improving Zero and Few-shot Generalization in Dialogue through Instruction Tuning

This work introduces I NSTRUCT D IAL, an instruction tuning framework for dialogue, which consists of a repository of 48 diverse dialogue tasks in a unified text-to-text format created from 59 openly available dialogue datasets, and reveals that it enables good zero-shot performance on unseen datasets and tasks such as dialogue evaluation and intent detection, and even better performance in a few-shot setting.

Open-domain Dialogue Generation: What We Can Do, Cannot Do, And Should Do Next

The goal of this work is to provide an overview of recent advances in the field of open-domain dialogue, to summarize issues related to ethics, bias, and fairness that the field has identified as well as typical errors of dialogue systems and to outline important future challenges.

A Stack-Propagation Framework for Low-Resource Personalized Dialogue Generation

Under different low-resource settings, subjective and objective evaluations prove that the stack-propagation framework outperforms strong baselines in response quality and persona consistency and largely overcomes the shortcomings of traditional models that rely heavily on the persona-dense dialogue data.

EVA2.0: Investigating Open-Domain Chinese Dialogue Systems with Large-Scale Pre-Training

Automatic and human evaluations show that the proposed EVA2.0, a large-scale pre-trained open-domain Chinese dialogue model with 2.8 billion parameters, outperforms other open-source counterparts.



Evaluating Coherence in Dialogue Systems using Entailment

Results show that interpretable metrics for evaluating topic coherence by making use of distributed sentence representations can be used as a surrogate for human judgment, making it easy to evaluate dialogue systems on large-scale datasets and allowing an unbiased estimate for the quality of the responses.

Wizard of Wikipedia: Knowledge-Powered Conversational agents

The best performing dialogue models are able to conduct knowledgeable discussions on open-domain topics as evaluated by automatic metrics and human evaluations, while a new benchmark allows for measuring further improvements in this important research direction.

Don’t Say That! Making Inconsistent Dialogue Unlikely with Unlikelihood Training

This work shows how all of the problems of generative dialogue models can be addressed by extending the recently introduced unlikelihood loss to these cases, and demonstrates the efficacy of this approach across several dialogue tasks.

DIALOGPT : Large-Scale Generative Pre-training for Conversational Response Generation

It is shown that conversational systems that leverage DialoGPT generate more relevant, contentful and context-consistent responses than strong baseline systems.

Towards Empathetic Open-domain Conversation Models: A New Benchmark and Dataset

This work proposes a new benchmark for empathetic dialogue generation and EmpatheticDialogues, a novel dataset of 25k conversations grounded in emotional situations, and presents empirical comparisons of dialogue model adaptations forEmpathetic responding, leveraging existing models or datasets without requiring lengthy re-training of the full model.

Retrieve and Refine: Improved Sequence Generation Models For Dialogue

This work develops a model that combines the two approaches to avoid both their deficiencies: first retrieve a response and then refine it – the final sequence generator treating the retrieval as additional context.

Personalizing Dialogue Agents: I have a dog, do you have pets too?

This work collects data and train models tocondition on their given profile information; and information about the person they are talking to, resulting in improved dialogues, as measured by next utterance prediction.

Consistent Dialogue Generation with Self-supervised Feature Learning

This paper proposes a neural conversation model that generates consistent responses by maintaining certain features related to topics and personas throughout the conversation by adopting a binary feature representation and introducing a feature disentangling loss.

Building End-To-End Dialogue Systems Using Generative Hierarchical Neural Network Models

The recently proposed hierarchical recurrent encoder-decoder neural network is extended to the dialogue domain, and it is demonstrated that this model is competitive with state-of-the-art neural language models and back-off n-gram models.

What makes a good conversation? How controllable attributes affect human judgments

This work examines two controllable neural text generation methods, conditional training and weighted decoding, in order to control four important attributes for chit-chat dialogue: repetition, specificity, response-relatedness and question-asking, and shows that by controlling combinations of these variables their models obtain clear improvements in human quality judgments.