Multi-View Feature Representation for Dialogue Generation with Bidirectional Distillation

@inproceedings{Feng2021MultiViewFR,
  title={Multi-View Feature Representation for Dialogue Generation with Bidirectional Distillation},
  author={Shaoxiong Feng and Xuancheng Ren and Kan Li and Xu Sun},
  booktitle={AAAI},
  year={2021}
}
Neural dialogue models suffer from low-quality responses when interacted in practice, demonstrating difficulty in generalization beyond training data. Recently, knowledge distillation has been used to successfully regularize the student by transferring knowledge from the teacher. However, the teacher and the student are trained on the same dataset and tend to learn similar feature representations, whereas the most general knowledge should be found through differences. The finding of general… 

Figures and Tables from this paper

Hierarchical Inductive Transfer for Continual Dialogue Learning

A hierarchical inductive transfer framework is proposed that enables new tasks to use general knowledge in the base adapter without being misled by diverse knowledge in task-specific adapters and obtains comparable performance under deployment-friendly model capacity.

WeaSuL: Weakly Supervised Dialogue Policy Learning: Reward Estimation for Multi-turn Dialogue

Empirical studies with two benchmarks indicate that the model can significantly out-perform the response quality and lead to a successful conversation on both automatic evaluation and human judgment.

WeaSuL: Weakly Supervised Dialogue Policy Learning: Reward Estimation for Multi-turn Dialogue

Empirical studies with two benchmarks indicate that the model can significantly out-perform the response quality and lead to a successful conversation on both automatic evaluation and human judgment.

A Response Generator with Response-Aware Encoder for Generating Speci c and Relevant Responses

A sequence-to-sequence response generator with a response-aware encoder that exploits golden responses by reflecting them into a query representation and the joint learning of a teacher and a student relevancy scorer is adopted.

Graph and Question Interaction Aware Graph2Seq Model for Knowledge Base Question Generation

A graph and question interaction enhanced Graph2Seq model is proposed, in which an encoder-decoder parallel enhancement mechanism is designed and the knowledge distillation is applied for both inter-mediate representation and prediction distribution to employ the knowledge of the target question into the graph representation.

References

SHOWING 1-10 OF 62 REFERENCES

Natural Language Generation for Effective Knowledge Distillation

On four datasets in sentiment classification, sentence similarity, and linguistic acceptability, this approach improves upon previous methods, and outperforms OpenAI GPT, a deep pretrained transformer, on three of the datasets, while using a single-layer bidirectional LSTM that runs at least ten times faster.

Sequence-Level Knowledge Distillation

It is demonstrated that standard knowledge distillation applied to word-level prediction can be effective for NMT, and two novel sequence-level versions of knowledge distilling are introduced that further improve performance, and somewhat surprisingly, seem to eliminate the need for beam search.

Learning Discourse-level Diversity for Neural Dialog Models using Conditional Variational Autoencoders

This work presents a novel framework based on conditional variational autoencoders that capture the discourse-level diversity in the encoder and uses latent variables to learn a distribution over potential conversational intents and generates diverse responses using only greedy decoders.

Distilling Knowledge Learned in BERT for Text Generation

A novel approach, Conditional Masked Language Modeling (C-MLM), is presented to enable the finetuning of BERT on target generation tasks, which significantly outperforms strong Transformer baselines on multiple language generation tasks such as machine translation and text summarization.

Learning a Matching Model with Co-teaching for Multi-turn Response Selection in Retrieval-based Dialogue Systems

To learn a robust matching model from noisy training data, a general co-teaching framework with three specific teaching strategies that cover both teaching with loss functions and teaching with data curriculum is proposed.

DialogWAE: Multimodal Response Generation with Conditional Wasserstein Auto-Encoder

DialogWAE is proposed, a conditional Wasserstein autoencoder specially designed for dialogue modeling that models the distribution of data by training a GAN within the latent variable space and develops a Gaussian mixture prior network to enrich the latent space.

A Neural Conversational Model

A simple approach to conversational modeling which uses the recently proposed sequence to sequence framework, and is able to extract knowledge from both a domain specific dataset, and from a large, noisy, and general domain dataset of movie subtitles.

Born Again Neural Networks

This work studies KD from a new perspective: rather than compressing models, students are trained parameterized identically to their teachers, and shows significant advantages from transferring knowledge between DenseNets and ResNets in either direction.

Deep Reinforcement Learning for Dialogue Generation

This work simulates dialogues between two virtual agents, using policy gradient methods to reward sequences that display three useful conversational properties: informativity, non-repetitive turns, coherence, and ease of answering.

Distilling Knowledge for Fast Retrieval-based Chat-bots

This paper proposes a new cross-encoders architecture and transfer knowledge from this model to a bi-encoder model using distillation, which effectively boosts bi- encoder performance at no cost during inference time.
...