Corpus ID: 59599816

Parameter-Efficient Transfer Learning for NLP

@inproceedings{Houlsby2019ParameterEfficientTL,
  title={Parameter-Efficient Transfer Learning for NLP},
  author={Neil Houlsby and Andrei Giurgiu and Stanislaw Jastrzebski and Bruna Morrone and Quentin de Laroussilhe and Andrea Gesmundo and Mona Attariyan and Sylvain Gelly},
  booktitle={ICML},
  year={2019}
}
Fine-tuning large pre-trained models is an effective transfer mechanism in NLP. [...] Key Method Adapter modules yield a compact and extensible model; they add only a few trainable parameters per task, and new tasks can be added without revisiting previous ones. The parameters of the original network remain fixed, yielding a high degree of parameter sharing. To demonstrate adapter's effectiveness, we transfer the recently proposed BERT Transformer model to 26 diverse text classification tasks, including the…Expand
Task-to-Task Transfer Learning with Parameter-Efficient Adapter
TLDR
An effective task-to-task transfer learning method with parameter-efficient adapter based on pre-trained language model, which can be trained on new tasks without hindering the performance of those already learned and can overcome catastrophic forgetting. Expand
On the Effectiveness of Adapter-based Tuning for Pretrained Language Model Adaptation
TLDR
It is demonstrated that 1) adapter-based tuning outperforms fine-tuning on low-resource and cross-lingual tasks; 2) it is more robust to overfitting and less sensitive to changes in learning rates. Expand
Compacter: Efficient Low-Rank Hypercomplex Adapter Layers
TLDR
This work proposes COMPACTER, a method for fine-tuning large-scale language models with a better trade-off between task performance and the number of trainable parameters than prior work, which accomplishes this by building on top of ideas from adapters, low-rank optimization, and parameterized hypercomplex multiplication layers. Expand
How fine can fine-tuning be? Learning efficient language models
TLDR
Fine-tuning of huge language models can be achieved by simply setting a certain number of entries in certain layers of the pre-trained parameters to zero, saving both task-specific parameter storage and computational cost. Expand
LoRA: Low-Rank Adaptation of Large Language Models
TLDR
Low-Rank Adaptation, or LoRA, is proposed, which freezes the pre-trained model weights and injects trainable rank decomposition matrices into each layer of the Transformer architecture, greatly reducing the number of trainable parameters for downstream tasks. Expand
LiST: Lite Self-training Makes Efficient Few-shot Learners
TLDR
A new method LiST1 for efficient fine-tuning of large pre-trained language models (PLMs) in few-shot learning settings and a comprehensive study on six NLU tasks to validate the effectiveness of LiST. Expand
Co-Tuning for Transfer Learning
TLDR
A two-step framework named Co-Tuning is proposed, which learns the relationship between source categories and target categories from the pre-trained model with calibrated predictions, and works not only in medium-sized datasets but also in large-scale datasets where regularization-based methods bring no gains over the vanilla fine-tuning. Expand
Robust Transfer Learning with Pretrained Language Models through Adapters
  • Wenjuan Han, Bo Pang, Ying Nian Wu
  • Computer Science
  • ACL/IJCNLP
  • 2021
TLDR
This work inserts small bottleneck layers (i.e., adapter) within each layer of a pretrained model, then fix the pretrained layers and train the adapter layers on the downstream task data, leading to improved stability and adversarial robustness in transfer learning to various downstream tasks. Expand
SPoT: Better Frozen Model Adaptation through Soft Prompt Transfer
TLDR
A novel prompt-based transfer learning approach called SPOT, which first learns a prompt on one or more source tasks and then uses it to initialize the prompt for a target task, and shows that SPOT significantly boosts the performance of PROMPTTUNING across many tasks. Expand
UniPELT: A Unified Framework for Parameter-Efficient Language Model Tuning
TLDR
A unified framework, UNIPELT, is proposed, which incorporates different PELT methods as submodules and learns to activate the ones that best suit the current data or task setup and often surpasses the upper bound when taking the best performance of all its submodules used individually on each task. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 54 REFERENCES
BERT and PALs: Projected Attention Layers for Efficient Adaptation in Multi-Task Learning
TLDR
Using new adaptation modules, PALs or `projected attention layers', this work matches the performance of separately fine-tuned models on the GLUE benchmark with roughly 7 times fewer parameters, and obtains state-of-the-art results on the Recognizing Textual Entailment dataset. Expand
Universal Language Model Fine-tuning for Text Classification
TLDR
This work proposes Universal Language Model Fine-tuning (ULMFiT), an effective transfer learning method that can be applied to any task in NLP, and introduces techniques that are key for fine- Tuning a language model. Expand
BERT-A : Fine-tuning BERT with Adapters and Data Augmentation
We tackle the contextual question answering (QA) problem on the SQuAD 2.0 dataset. Our project has two main objectives. Firstly, we aim to build a model that achieves a reasonable performance whileExpand
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
TLDR
A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks. Expand
Universal Sentence Encoder
TLDR
It is found that transfer learning using sentence embeddings tends to outperform word level transfer with surprisingly good performance with minimal amounts of supervised training data for a transfer task. Expand
Improving Language Understanding by Generative Pre-Training
TLDR
The general task-agnostic model outperforms discriminatively trained models that use architectures specifically crafted for each task, significantly improving upon the state of the art in 9 out of the 12 tasks studied. Expand
Coarse-to-Fine Question Answering for Long Documents
TLDR
A framework for question answering that can efficiently scale to longer documents while maintaining or even improving performance of state-of-the-art models is presented and sentence selection is treated as a latent variable trained jointly from the answer only using reinforcement learning. Expand
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
TLDR
A benchmark of nine diverse NLU tasks, an auxiliary dataset for probing models for understanding of specific linguistic phenomena, and an online platform for evaluating and comparing models, which favors models that can represent linguistic knowledge in a way that facilitates sample-efficient learning and effective knowledge-transfer across tasks. Expand
Efficient Parametrization of Multi-domain Deep Neural Networks
TLDR
This paper proposes to consider universal parametric families of neural networks, which still contain specialized problem-specific models, but differing only by a small number of parameters, and shows that these universal parametrization are very effective for transfer learning, where they outperform traditional fine-tuning techniques. Expand
Incremental Learning Through Deep Adaptation
TLDR
This work proposes a method called Deep Adaptation Modules (DAM) that constrains newly learned filters to be linear combinations of existing ones, and reduces the parameter cost to around 3 percent of the original with negligible or no loss in accuracy. Expand
...
1
2
3
4
5
...