Hyperparameter-free Continuous Learning for Domain Classification in Natural Language Understanding

  title={Hyperparameter-free Continuous Learning for Domain Classification in Natural Language Understanding},
  author={Ting Hua and Yilin Shen and Changsheng Zhao and Yen-Chang Hsu and Hongxia Jin},
Domain classification is the fundamental task in natural language understanding (NLU), which often requires fast accommodation to new emerging domains. This constraint makes it impossible to retrain all previous domains, even if they are accessible to the new model. Most existing continual learning approaches suffer from low accuracy and performance fluctuation, especially when the distributions of old and new data are significantly different. In fact, the key real-world problem is not the… 

Figures and Tables from this paper

Language model compression with weighted low-rank factorization

The Fisher-Weighted SVD method can directly compress a task-speci-c model while achieving better performance than other compact model strategies requiring expensive model pre-training.



Continuous Learning for Large-scale Personalized Domain Classification

CoNDA is proposed, a neural-based approach for continuous domain adaption with normalization and regularization that achieves high accuracy on both the accommodated new domains and the existing known domains for which input samples come with personal information, and outperforms the baselines by a large margin.

Efficient Large-Scale Neural Domain Classification with Personalized Attention

This paper proposes a scalable neural model architecture with a shared encoder, a novel attention mechanism that incorporates personalization information and domain-specific classifiers that solves the problem efficiently and demonstrates that incorporating personalization significantly improves domain classification accuracy in a setting with thousands of overlapping domains.

A Scalable Neural Shortlisting-Reranking Approach for Large-Scale Domain Classification in Natural Language Understanding

A set of efficient and scalable shortlisting-reranking neural models for effective large-scale domain classification for IPDAs and shows the effectiveness of the approach with extensive experiments on 1,500 IPDA domains.

Parameter-Efficient Transfer Learning for NLP

To demonstrate adapter's effectiveness, the recently proposed BERT Transformer model is transferred to 26 diverse text classification tasks, including the GLUE benchmark, and adapter attain near state-of-the-art performance, whilst adding only a few parameters per task.

Learning without Forgetting

This work proposes the Learning without Forgetting method, which uses only new task data to train the network while preserving the original capabilities, and performs favorably compared to commonly used feature extraction and fine-tuning adaption techniques.

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.

iCaRL: Incremental Classifier and Representation Learning

iCaRL can learn many classes incrementally over a long period of time where other strategies quickly fail, and distinguishes it from earlier works that were fundamentally limited to fixed data representations and therefore incompatible with deep learning architectures.

Gradient Episodic Memory for Continual Learning

A model for continual learning, called Gradient Episodic Memory (GEM) is proposed that alleviates forgetting, while allowing beneficial transfer of knowledge to previous tasks.

Overcoming catastrophic forgetting in neural networks

It is shown that it is possible to overcome the limitation of connectionist models and train networks that can maintain expertise on tasks that they have not experienced for a long time and selectively slowing down learning on the weights important for previous tasks.

DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition

DeCAF, an open-source implementation of deep convolutional activation features, along with all associated network parameters, are released to enable vision researchers to be able to conduct experimentation with deep representations across a range of visual concept learning paradigms.