Learning from Easy to Complex: Adaptive Multi-curricula Learning for Neural Dialogue Generation

@inproceedings{Cai2020LearningFE,
  title={Learning from Easy to Complex: Adaptive Multi-curricula Learning for Neural Dialogue Generation},
  author={Hengyi Cai and Hongshen Chen and Cheng Zhang and Yonghao Song and Xiaofang Zhao and Yangxi Li and Dongsheng Duan and Dawei Yin},
  booktitle={AAAI},
  year={2020}
}
Current state-of-the-art neural dialogue systems are mainly data-driven and are trained on human-generated responses. However, due to the subjectivity and open-ended nature of human conversations, the complexity of training dialogues varies greatly. The noise and uneven complexity of query-response pairs impede the learning efficiency and effects of the neural dialogue generation models. What is more, so far, there are no unified dialogue complexity measurements, and the dialogue complexity… 

Figures and Tables from this paper

Data Manipulation: Towards Effective Instance Learning for Neural Dialogue Generation via Learning to Augment and Reweight

This paper proposes a data manipulation framework to proactively reshape the data distribution towards reliable samples by augmenting and highlighting effective learning samples as well as reducing the effect of inefficient samples simultaneously.

Combining Curriculum Learning and Knowledge Distillation for Dialogue Generation

Curriculum learning, a machine training strategy that feeds training instances to the model from easy to hard, has been proven to facilitate the dialogue generation task. Meanwhile, knowledge

Group-wise Contrastive Learning for Neural Dialogue Generation

This work introduces contrastive learning into dialogue generation, where the model explicitly perceives the difference between the well-chosen positive and negative utterances, and augments contrastive dialogue learning with group-wise dual sampling.

A Model-agnostic Data Manipulation Method for Persona-based Dialogue Generation

This work proposes a data manipulation method, which is model-agnostic to be packed with any persona-based dialogue generation model to improve their performance and shows various effective ways that can diversify such easier distilled data.

Identifying Untrustworthy Samples: Data Filtering for Open-domain Dialogues with Bayesian Optimization

This paper presents a data filtering method for open-domain dialogues, which identifies untrustworthy samples from training data with a quality measure that linearly combines seven dialogue attributes, and proposes a training framework that integrates maximum likelihood estimation (MLE) and negative training method (NEG).

On Curriculum Learning for Commonsense Reasoning

This work uses paced curriculum learning to rank data and sample training mini-batches with increasing levels of difficulty from the ranked dataset during finetuning, and finds that prioritizing the difficult samples in the tail end of training improves generalization to unseen in- domain data as well as out-of-domain data.

Open-domain Dialogue Generation: What We Can Do, Cannot Do, And Should Do Next

The goal of this work is to provide an overview of recent advances in the field of open-domain dialogue, to summarize issues related to ethics, bias, and fairness that the field has identified as well as typical errors of dialogue systems and to outline important future challenges.

Preview, Attend and Review: Schema-Aware Curriculum Learning for Multi-Domain Dialogue State Tracking

This paper proposes a model-agnostic framework called Schema-aware Curriculum Learning for Dialog State Tracking (SaCLog), which consists of a preview module that pre-trains a DST model with schema information, a curriculum module that optimizes the model with CL, and a review module that augments mispredicted data to reinforce the CL training.

CDL: Curriculum Dual Learning for Emotion-Controllable Response Generation

Curriculum Dual Learning is proposed which extends the emotion-controllable response generation to a dual task to generate emotional responses and emotional queries alternatively and significantly outperforms the baselines in terms of coherence, diversity, and relation to emotion factors.

An Empirical Study on the Overlapping Problem of Open-Domain Dialogue Datasets

This work observes the overlapping problem in DailyDialog and OpenSubtitles, two popular open-domain dialogue benchmark datasets, and shows that such overlapping can be exploited to obtain fake state-of-the-art performance.

References

SHOWING 1-10 OF 36 REFERENCES

Not All Dialogues are Created Equal: Instance Weighting for Neural Conversational Models

Evaluation results on retrieval-based models trained on movie and TV subtitles demonstrate that the inclusion of such a weighting model improves the model performance on unsupervised metrics.

Building End-To-End Dialogue Systems Using Generative Hierarchical Neural Network Models

The recently proposed hierarchical recurrent encoder-decoder neural network is extended to the dialogue domain, and it is demonstrated that this model is competitive with state-of-the-art neural language models and back-off n-gram models.

Augmenting End-to-End Dialog Systems with Commonsense Knowledge

This model represents the first attempt to integrating a large commonsense knowledge base into end-to-end conversational models and suggests that the knowledge-augmented models are superior to their knowledge-free counterparts.

Curriculum Learning for Natural Answer Generation

A curriculum learning based framework for natural answer generation (CL-NAG), which is able to take full advantage of the valuable learning data from a noisy and uneven-quality corpus, and outperforms the state-of-the-arts.

ReCoSa: Detecting the Relevant Contexts with Self-Attention for Multi-turn Dialogue Generation

Experimental results on both Chinese customer services dataset and English Ubuntu dialogue dataset show that ReCoSa significantly outperforms baseline models, in terms of both metric-based and human evaluations.

A Hierarchical Latent Variable Encoder-Decoder Model for Generating Dialogues

A neural network-based generative architecture, with stochastic latent variables that span a variable number of time steps, that improves upon recently proposed models and that the latent variables facilitate both the generation of meaningful, long and diverse responses and maintaining dialogue state is proposed.

What makes a good conversation? How controllable attributes affect human judgments

This work examines two controllable neural text generation methods, conditional training and weighted decoding, in order to control four important attributes for chit-chat dialogue: repetition, specificity, response-relatedness and question-asking, and shows that by controlling combinations of these variables their models obtain clear improvements in human quality judgments.

Simple and Effective Curriculum Pointer-Generator Networks for Reading Comprehension over Long Narratives

A curriculum learning (CL) based Pointer-Generator framework for reading/sampling over large documents, enabling diverse training of the neural model based on the notion of alternating contextual difficulty, and a new Introspective Alignment Layer (IAL), which reasons over decomposed alignments using block-based self-attention.

DialogWAE: Multimodal Response Generation with Conditional Wasserstein Auto-Encoder

DialogWAE is proposed, a conditional Wasserstein autoencoder specially designed for dialogue modeling that models the distribution of data by training a GAN within the latent variable space and develops a Gaussian mixture prior network to enrich the latent space.

Improving Neural Conversational Models with Entropy-Based Data Filtering

This work presents a method of filtering dialog datasets by removing generic utterances from training data using a simple entropy-based approach that does not require human supervision, and shows that training on datasets filtered this way results in better conversational quality as chatbots learn to output more diverse responses.