Towards Robust Online Dialogue Response Generation

  title={Towards Robust Online Dialogue Response Generation},
  author={Leyang Cui and Fandong Meng and Yanjun Liu and Jie Zhou and Yue Zhang},
Although pre-trained sequence-to-sequence 001 models have achieved great success in dia- 002 logue response generation, chatbots still suf- 003 fer from generating inconsistent responses in 004 real-world practice, especially in multi-turn 005 settings. We argue that this can be caused 006 by a discrepancy between training and real- 007 world testing. At training time, chatbot gen- 008 erates response with the golden context, while 009 it has to generate based on the context con- 010 sisting of… 

Figures and Tables from this paper



Wizard of Wikipedia: Knowledge-Powered Conversational agents

The best performing dialogue models are able to conduct knowledgeable discussions on open-domain topics as evaluated by automatic metrics and human evaluations, while a new benchmark allows for measuring further improvements in this important research direction.

Fine-Tuning Language Models from Human Preferences

This paper builds on advances in generative pretraining of language models to apply reward learning to four natural language tasks: continuing text with positive sentiment or physically descriptive language, and summarization tasks on the TL;DR and CNN/Daily Mail datasets.

Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks

This work proposes a curriculum learning strategy to gently change the training process from a fully guided scheme using the true previous token, towards a less guided scheme which mostly uses the generated token instead.

Attention is All you Need

A new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely is proposed, which generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.

Knowledge Enhanced Fine-Tuning for Better Handling Unseen Entities in Dialogue Generation

This work introduces two auxiliary training objectives: Interpret Masked Word, which conjectures the meaning of the masked entity given the context; and Hypernym Generation, which predicts the hypernym of the entity based on the context.

Adaptive Bridge between Training and Inference for Dialogue Generation

A novel adaptive switching mechanism is proposed, which learns to automatically transit between ground-truth learning and generated learning regarding the word-level matching score, such as the cosine similarity, which achieves a significant improvement in terms of metric-based evaluation and human evaluation.

Don’t be Contradicted with Anything! CI-ToD: Towards Benchmarking Consistency for Task-oriented Dialogue System

This paper introduces CI-ToD, a novel dataset for Consistency Identification in Task-oriented Dialog system, and annotates the single label to enable the model to judge whether the system response is contradictory, but also provides more fine-grained labels to encourage model to know what inconsistent sources lead to it.

Scheduled Sampling Based on Decoding Steps for Neural Machine Translation

This work proposes scheduled sampling methods based on decoding steps, increasing the selection chance of predicted tokens with the growth of decoding steps so that it can more realistically simulate the inference scene during training, thus better bridging the gap between training and inference.

WeaSuL: Weakly Supervised Dialogue Policy Learning: Reward Estimation for Multi-turn Dialogue

Empirical studies with two benchmarks indicate that the model can significantly out-perform the response quality and lead to a successful conversation on both automatic evaluation and human judgment.

Confidence-Aware Scheduled Sampling for Neural Machine Translation

Confidence-aware scheduled sampling is proposed, which quantifies realtime model competence by the confidence of model predictions, based on which fine-grained schedule strategies are designed and significantly outperforms the Transformer and vanilla scheduled sampling on both translation quality and convergence speed.