• Corpus ID: 238857301

UniPELT: A Unified Framework for Parameter-Efficient Language Model Tuning

@article{Mao2021UniPELTAU,
  title={UniPELT: A Unified Framework for Parameter-Efficient Language Model Tuning},
  author={Yuning Mao and Lambert Mathias and Rui Hou and Amjad Almahairi and Hao Ma and Jiawei Han and Wen-tau Yih and Madian Khabsa},
  journal={ArXiv},
  year={2021},
  volume={abs/2110.07577}
}
Conventional fine-tuning of pre-trained language models tunes all model parameters and stores a full model copy for each downstream task, which has become increasingly infeasible as the model size grows larger. Recent parameter-efficient language model tuning (PELT) methods manage to match the performance of fine-tuning with much fewer trainable parameters and perform especially well when the training data is limited. However, different PELT methods may perform rather differently on the same… 

Figures and Tables from this paper

VL-Adapter: Parameter-Efficient Transfer Learning for Vision-and-Language Tasks
TLDR
This paper introduces adapter-based parameter-efficient transfer learning techniques to V&L models such as VL-BART and VL, and demonstrates that training the adapter with the weight-sharing technique can match the performance of fine-tuning the entire model.
Discourse-Aware Prompt Design for Text Generation
TLDR
This work shows that prompt based conditional text generation can be improved with simple and efficient methods that simulate modeling the discourse structure of human written text, and proposes sparse prefix tuning by introducing attention sparsity on the prefix parameters at different layers of the network and learn sparse transformations on the softmax-function.

References

SHOWING 1-10 OF 23 REFERENCES
LoRA: Low-Rank Adaptation of Large Language Models
TLDR
Low-Rank Adaptation, or LoRA, is proposed, which freezes the pre-trained model weights and injects trainable rank decomposition matrices into each layer of the Transformer architecture, greatly reducing the number of trainable parameters for downstream tasks.
COMPACTER: Efficient Low-Rank Hypercomplex Adapter Layers
Adapting large-scale pretrained language models to downstream tasks via 1 fine-tuning is the standard method for achieving state-of-the-art performance on 2 NLP benchmarks. However, fine-tuning all
Parameter-Efficient Transfer Learning for NLP
TLDR
To demonstrate adapter's effectiveness, the recently proposed BERT Transformer model is transferred to 26 diverse text classification tasks, including the GLUE benchmark, and adapter attain near state-of-the-art performance, whilst adding only a few parameters per task.
Prefix-Tuning: Optimizing Continuous Prompts for Generation
TLDR
Prefix-tuning is proposed, a lightweight alternative to fine- Tuning for natural language generation tasks, which keeps language model parameters frozen, but optimizes a small continuous task-specific vector (called the prefix).
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
TLDR
A benchmark of nine diverse NLU tasks, an auxiliary dataset for probing models for understanding of specific linguistic phenomena, and an online platform for evaluating and comparing models, which favors models that can represent linguistic knowledge in a way that facilitates sample-efficient learning and effective knowledge-transfer across tasks.
Making Pre-trained Language Models Better Few-shot Learners
TLDR
The LM-BFF approach makes minimal assumptions on task resources and domain expertise, and hence constitutes a strong task-agnostic method for few-shot learning.
The Power of Scale for Parameter-Efficient Prompt Tuning
TLDR
This work explores “prompt tuning”, a simple yet effective mechanism for learning “soft prompts” to condition frozen language models to perform specific downstream tasks, and shows that conditioning a frozen model with soft prompts confers benefits in robustness to domain transfer, as compared to full model tuning.
Parameter-efficient Multi-task Fine-tuning for Transformers via Shared Hypernetworks
TLDR
This paper shows that one can learn adapter parameters for all layers and tasks by generating them using shared hypernetworks, which condition on task, adapter position, and layer id in a transformer model.
On the Effectiveness of Adapter-based Tuning for Pretrained Language Model Adaptation
TLDR
It is demonstrated that 1) adapter-based tuning outperforms fine-tuning on low-resource and cross-lingual tasks; 2) it is more robust to overfitting and less sensitive to changes in learning rates.
What Would Elsa Do? Freezing Layers During Transformer Fine-Tuning
TLDR
This paper examines two recent pretrained language models, BERT and RoBERTa, across standard tasks in textual entailment, semantic similarity, sentiment analysis, and linguistic acceptability, and shows that only a fourth of the final layers need to be fine-tuned to achieve 90% of the original quality.
...
1
2
3
...