Hierarchical Inductive Transfer for Continual Dialogue Learning

  title={Hierarchical Inductive Transfer for Continual Dialogue Learning},
  author={Shaoxiong Feng and Xuancheng Ren and Kan Li and Xu Sun},
Pre-trained models have achieved excellent performance on the dialogue task. However, for the continual increase of online chit-chat scenarios, directly fine-tuning these models for each of the new tasks not only explodes the capacity of the dialogue system on the embedded devices but also causes knowledge forgetting on pre-trained models and knowledge interference among diverse dialogue tasks. In this work, we propose a hierarchical inductive transfer framework to learn and deploy the dialogue… 

Figures and Tables from this paper

Incremental Prompting: Episodic Memory Prompt for Lifelong Event Detection

This paper introduces Episodic Memory Prompts (EMP) to explicitly retain the learned task-specific knowledge and conduct a comprehensive analysis of the new and old event types in lifelong learning.



Multi-View Feature Representation for Dialogue Generation with Bidirectional Distillation

A novel training framework that extends the unidirectionaldistillation to the bidirectional distillation that encourages the student and its student peers to co-evolve by exchanging complementary knowledge with each other and improves the model generalization without sacrificing training efficiency.

Wizard of Wikipedia: Knowledge-Powered Conversational agents

The best performing dialogue models are able to conduct knowledgeable discussions on open-domain topics as evaluated by automatic metrics and human evaluations, while a new benchmark allows for measuring further improvements in this important research direction.

The Adapter-Bot: All-In-One Controllable Conversational Model

The Adapter-Bot is presented, a generative chat-bot that uses a fixed backbone conversational model such as DialGPT and triggers on-demand dialogue skills via different adapters that can be trained independently, thus allowing a continual integration of skills without retraining the entire model.

Building End-To-End Dialogue Systems Using Generative Hierarchical Neural Network Models

The recently proposed hierarchical recurrent encoder-decoder neural network is extended to the dialogue domain, and it is demonstrated that this model is competitive with state-of-the-art neural language models and back-off n-gram models.

Regularizing Dialogue Generation by Imitating Implicit Scenarios

This work proposes to improve generative dialogue systems from the scenario perspective, where both dialogue history and future conversation are taken into account to implicitly reconstruct the scenario knowledge.

A Hierarchical Latent Variable Encoder-Decoder Model for Generating Dialogues

A neural network-based generative architecture, with stochastic latent variables that span a variable number of time steps, that improves upon recently proposed models and that the latent variables facilitate both the generation of meaningful, long and diverse responses and maintaining dialogue state is proposed.

Generating Relevant and Coherent Dialogue Responses using Self-Separated Conditional Variational AutoEncoders

Self-separated Conditional Variational AutoEncoder (abbreviated as SepaCVAE) is proposed that introduces group information to regularize the latent variables, which enhances CVAE by improving the responses’ relevance and coherence while maintaining their diversity and informativeness.

Can You Put it All Together: Evaluating Conversational Agents’ Ability to Blend Skills

This work investigates several ways to combine models trained towards isolated capabilities, ranging from simple model aggregation schemes that require minimal additional training, to various forms of multi-task training that encompass several skills at all training stages.

Learning Discourse-level Diversity for Neural Dialog Models using Conditional Variational Autoencoders

This work presents a novel framework based on conditional variational autoencoders that capture the discourse-level diversity in the encoder and uses latent variables to learn a distribution over potential conversational intents and generates diverse responses using only greedy decoders.

Parameter-Efficient Transfer Learning for NLP

To demonstrate adapter's effectiveness, the recently proposed BERT Transformer model is transferred to 26 diverse text classification tasks, including the GLUE benchmark, and adapter attain near state-of-the-art performance, whilst adding only a few parameters per task.