Continual Sequence Generation with Adaptive Compositional Modules
@inproceedings{Zhang2022ContinualSG, title={Continual Sequence Generation with Adaptive Compositional Modules}, author={Yanzhe Zhang and Xuezhi Wang and Diyi Yang}, booktitle={ACL}, year={2022} }
Continual learning is essential for real-world deployment when there is a need to quickly adapt the model to new tasks without forgetting knowledge of old tasks. Existing work on continual sequence generation either always reuses existing parameters to learn new tasks, which is vulnerable to catastrophic forgetting on dissimilar tasks, or blindly adds new parameters for every new task, which could prevent knowledge sharing between similar tasks. To get the best of both worlds, in this work, we…
Figures and Tables from this paper
References
SHOWING 1-10 OF 58 REFERENCES
LAMOL: LAnguage MOdeling for Lifelong Language Learning
- Computer ScienceICLR
- 2020
The results show that LAMOL prevents catastrophic forgetting without any sign of intransigence and can perform five very different language tasks sequentially with only one model.
Continual Learning in Task-Oriented Dialogue Systems
- Computer ScienceEMNLP
- 2021
A first-ever continual learning benchmark for task-oriented dialogue systems with 37 domains to be learned continuously in both modularized and end-to-end learning settings is proposed and a simple yet effective architectural method based on residual adapters is proposed.
AdapterDrop: On the Efficiency of Adapters in Transformers
- Computer ScienceEMNLP
- 2021
This paper proposes AdapterDrop, removing adapters from lower transformer layers during training and inference, which incorporates concepts from all three directions and can dynamically reduce the computational overhead when performing inference over multiple tasks simultaneously, with minimal decrease in task performances.
Lifelong Language Knowledge Distillation
- Computer ScienceEMNLP
- 2020
The proposed Lifelong Language Knowledge Distillation (L2KD), a simple but efficient method that can be easily applied to existing LLL architectures in order to mitigate the degradation, is presented.
Continual Learning for Natural Language Generation in Task-oriented Dialog Systems
- Computer ScienceFINDINGS
- 2020
This work proposes a method called ARPER (Adaptively Regularized Prioritized Prioritized Exemplar Replay) by replaying prioritized historical exemplars, together with an adaptive regularization technique based on Elastic Weight Consolidation.
Parameter-Efficient Transfer Learning for NLP
- Computer ScienceICML
- 2019
To demonstrate adapter's effectiveness, the recently proposed BERT Transformer model is transferred to 26 diverse text classification tasks, including the GLUE benchmark, and adapter attain near state-of-the-art performance, whilst adding only a few parameters per task.
Overcoming catastrophic forgetting in neural networks
- Computer ScienceProceedings of the National Academy of Sciences
- 2017
It is shown that it is possible to overcome the limitation of connectionist models and train networks that can maintain expertise on tasks that they have not experienced for a long time and selectively slowing down learning on the weights important for previous tasks.
Language Models are Unsupervised Multitask Learners
- Computer Science
- 2019
It is demonstrated that language models begin to learn these tasks without any explicit supervision when trained on a new dataset of millions of webpages called WebText, suggesting a promising path towards building language processing systems which learn to perform tasks from their naturally occurring demonstrations.
Pseudo-Recursal: Solving the Catastrophic Forgetting Problem in Deep Neural Networks
- Computer ScienceArXiv
- 2018
This work accomplishes pseudo-rehearsal by using a Generative Adversarial Network to generate items so that the deep network can learn to sequentially classify the CIFAR-10, SVHN and MNIST datasets.
Semantically Conditioned LSTM-based Natural Language Generation for Spoken Dialogue Systems
- Computer ScienceEMNLP
- 2015
A statistical language generator based on a semantically controlled Long Short-term Memory (LSTM) structure that can learn from unaligned data by jointly optimising sentence planning and surface realisation using a simple cross entropy training criterion, and language variation can be easily achieved by sampling from output candidates.