Continual Sequence Generation with Adaptive Compositional Modules

  title={Continual Sequence Generation with Adaptive Compositional Modules},
  author={Yanzhe Zhang and Xuezhi Wang and Diyi Yang},
Continual learning is essential for real-world deployment when there is a need to quickly adapt the model to new tasks without forgetting knowledge of old tasks. Existing work on continual sequence generation either always reuses existing parameters to learn new tasks, which is vulnerable to catastrophic forgetting on dissimilar tasks, or blindly adds new parameters for every new task, which could prevent knowledge sharing between similar tasks. To get the best of both worlds, in this work, we… 


LAMOL: LAnguage MOdeling for Lifelong Language Learning
The results show that LAMOL prevents catastrophic forgetting without any sign of intransigence and can perform five very different language tasks sequentially with only one model.
Continual Learning in Task-Oriented Dialogue Systems
A first-ever continual learning benchmark for task-oriented dialogue systems with 37 domains to be learned continuously in both modularized and end-to-end learning settings is proposed and a simple yet effective architectural method based on residual adapters is proposed.
AdapterDrop: On the Efficiency of Adapters in Transformers
This paper proposes AdapterDrop, removing adapters from lower transformer layers during training and inference, which incorporates concepts from all three directions and can dynamically reduce the computational overhead when performing inference over multiple tasks simultaneously, with minimal decrease in task performances.
Lifelong Language Knowledge Distillation
The proposed Lifelong Language Knowledge Distillation (L2KD), a simple but efficient method that can be easily applied to existing LLL architectures in order to mitigate the degradation, is presented.
Continual Learning for Natural Language Generation in Task-oriented Dialog Systems
This work proposes a method called ARPER (Adaptively Regularized Prioritized Prioritized Exemplar Replay) by replaying prioritized historical exemplars, together with an adaptive regularization technique based on Elastic Weight Consolidation.
Parameter-Efficient Transfer Learning for NLP
To demonstrate adapter's effectiveness, the recently proposed BERT Transformer model is transferred to 26 diverse text classification tasks, including the GLUE benchmark, and adapter attain near state-of-the-art performance, whilst adding only a few parameters per task.
Overcoming catastrophic forgetting in neural networks
It is shown that it is possible to overcome the limitation of connectionist models and train networks that can maintain expertise on tasks that they have not experienced for a long time and selectively slowing down learning on the weights important for previous tasks.
Language Models are Unsupervised Multitask Learners
It is demonstrated that language models begin to learn these tasks without any explicit supervision when trained on a new dataset of millions of webpages called WebText, suggesting a promising path towards building language processing systems which learn to perform tasks from their naturally occurring demonstrations.
Pseudo-Recursal: Solving the Catastrophic Forgetting Problem in Deep Neural Networks
This work accomplishes pseudo-rehearsal by using a Generative Adversarial Network to generate items so that the deep network can learn to sequentially classify the CIFAR-10, SVHN and MNIST datasets.
Semantically Conditioned LSTM-based Natural Language Generation for Spoken Dialogue Systems
A statistical language generator based on a semantically controlled Long Short-term Memory (LSTM) structure that can learn from unaligned data by jointly optimising sentence planning and surface realisation using a simple cross entropy training criterion, and language variation can be easily achieved by sampling from output candidates.