Dialogue Response Selection with Hierarchical Curriculum Learning

  title={Dialogue Response Selection with Hierarchical Curriculum Learning},
  author={Yixuan Su and Deng Cai and Qingyu Zhou and Zibo Lin and Simon Baker and Yunbo Cao and Shuming Shi and Nigel Collier and Yan Wang},
We study the learning of a matching model for dialogue response selection. Motivated by the recent finding that models trained with random negative samples are not ideal in real-world scenarios, we propose a hierarchical curriculum learning framework that trains the matching model in an “easy-to-difficult” scheme. Our learning framework consists of two complementary curricula: (1) corpus-level curriculum (CC); and (2) instance-level curriculum (IC). In CC, the model gradually increases its… 

Figures and Tables from this paper

A Survey on Response Selection for Retrieval-based Dialogues

A comprehensive survey of recent advances in response selection for retrieval-based dialogues and summarizes some recent advances on the research of response selection, including incorporation with extra knowledge and exploration on more effective model learning.

Advances in Multi-turn Dialogue Comprehension: A Survey

The characteristics and challenges of dialogue comprehension in contrast to plaintext reading comprehension are summarized and three typical patterns of dialogue modeling that are widely-used in dialogue comprehension tasks such as response selection and conversation questionanswering are discussed.

Preview, Attend and Review: Schema-Aware Curriculum Learning for Multi-Domain Dialogue State Tracking

This paper proposes a model-agnostic framework called Schema-aware Curriculum Learning for Dialog State Tracking (SaCLog), which consists of a preview module that pre-trains a DST model with schema information, a curriculum module that optimizes the model with CL, and a review module that augments mispredicted data to reinforce the CL training.

CLINE: Contrastive Learning with Semantic Negative Examples for Natural Language Understanding

This work proposes Contrastive Learning with semantIc Negative Examples (CLINE), which constructs semantic negative examples unsupervised to improve the robustness under semantically adversarial attacking and yields substantial improvements on a range of sentiment analysis, reasoning, and reading comprehension tasks.

Small Changes Make Big Differences: Improving Multi-turn Response Selection in Dialogue Systems via Fine-Grained Contrastive Learning

A novel FGC learning method is proposed for the response selection task based on PLMs to generate more distinguishable pair representations of each dialogue at fine grains, and further make better predictions on choosing positive responses.

On Task-Adaptive Pretraining for Dialogue Response Selection

It is shown that ini-tializing with RoBERTa achieve similar performance as BERT, and MLM+NSP can outperform all previously proposed TAP tasks, during which they also contribute a new state-of-the-art on the Ubuntu corpus.

PCC: Paraphrasing with Bottom-k Sampling and Cyclic Learning for Curriculum Data Augmentation

Human evaluation and extensive case studies indicate that bottom-k sampling effectively generates super-hard instances, and PCC significantly improves the baseline dialogue agent.

A Contrastive Framework for Neural Text Generation

This work shows that an underlying reason for model degeneration is the anisotropic distribution of token representations, and presents a contrastive solution that outperforms state-of-the-art text generation methods as evaluated by both human and automatic metrics.

Reranking Overgenerated Responses for End-to-End Task-Oriented Dialogue Systems

A simple yet effective reranking method which aims to select high-quality items from the lists of responses initially overgenerated by the system, using any sequence-level (similarity) scoring function to divide the semantic space of responses into high-scoring versus low-scoring partitions.

Contrastive Search Is What You Need For Neural Text Generation

Surprisingly, it is found that the anisotropic problem only exists in the two English GPT-2-small/medium models, which is in contrast to the conclusion drawn by previous studies [6, 24].



Learning a Matching Model with Co-teaching for Multi-turn Response Selection in Retrieval-based Dialogue Systems

To learn a robust matching model from noisy training data, a general co-teaching framework with three specific teaching strategies that cover both teaching with loss functions and teaching with data curriculum is proposed.

Sampling Matters! An Empirical Study of Negative Sampling Strategies for Learning of Matching Models in Retrieval-based Dialogue Systems

Empirical studies indicate that compared with the widely used random sampling strategy, although the first two strategies lead to performance drop, the latter two ones can bring consistent improvement to the performance of all the models on both benchmarks.

The World Is Not Binary: Learning to Rank with Grayscale Data for Dialogue Response Selection

This work shows that grayscale data can be automatically constructed without human effort, and proposes multi-level ranking objectives for training, which can teach a matching model to capture more fine-grained context-response relevance difference and reduce the train-test discrepancy in terms of distractor strength.

Speaker-Aware BERT for Multi-Turn Response Selection in Retrieval-Based Chatbots

A new model, named Speaker-Aware BERT (SA-BERT), is proposed in order to make the model aware of the speaker change information, which is an important and intrinsic property of multi-turn dialogues and a speaker-aware disentanglement strategy is proposed to tackle the entangled dialogues.

Sequential Matching Network: A New Architecture for Multi-turn Response Selection in Retrieval-Based Chatbots

A sequential matching network (SMN) first matches a response with each utterance in the context on multiple levels of granularity, and distills important matching information from each pair as a vector with convolution and pooling operations.

Interactive Matching Network for Multi-Turn Response Selection in Retrieval-Based Chatbots

Experiments show that IMN outperforms the baseline models on all metrics, achieving a new state-of-the-art performance and demonstrating compatibility across domains for multi-turn response selection.

Modeling Multi-turn Conversation with Deep Utterance Aggregation

This paper formulate previous utterances into context using a proposed deep utterance aggregation model to form a fine-grained context representation, and shows the model outperforms the state-of-the-art methods on three multi-turn conversation benchmarks, including a newly introduced e-commerce dialogue corpus.

Constructing Interpretive Spatio-Temporal Features for Multi-Turn Responses Selection

Evaluation on two large-scale multi-turn response selection tasks has demonstrated that the proposed Spatio-Temporal Matching network (STM) significantly outperforms the state-of-the-art model and enables matching information in segment pairs and time sequences, and have good interpretability for multi- turn text matching.

Multi-hop Selector Network for Multi-turn Response Selection in Retrieval-based Chatbots

The side effect of using too many context utterances is analyzed and a multi-hop selector network (MSN) is proposed to alleviate the problem and results show that MSN outperforms some state-of-the-art methods on three public multi-turn dialogue datasets.

Multi-Turn Response Selection for Chatbots with Deep Attention Matching Network

This paper investigates matching a response with its multi-turn context using dependency information based entirely on attention using Transformer in machine translation and extends the attention mechanism in two ways, which jointly introduce those two kinds of attention in one uniform neural network.