Prefix-Tuning: Optimizing Continuous Prompts for Generation

@article{Li2021PrefixTuningOC,
  title={Prefix-Tuning: Optimizing Continuous Prompts for Generation},
  author={Xiang Lisa Li and Percy Liang},
  journal={Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)},
  year={2021},
  volume={abs/2101.00190}
}
  • Xiang Lisa Li, Percy Liang
  • Published 2021
  • Computer Science
  • Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)
Fine-tuning is the de facto way of leveraging large pretrained language models for downstream tasks. However, fine-tuning modifies all the language model parameters and therefore necessitates storing a full copy for each task. In this paper, we propose prefix-tuning, a lightweight alternative to fine-tuning for natural language generation tasks, which keeps language model parameters frozen and instead optimizes a sequence of continuous task-specific vectors, which we call the prefix. Prefix… 
Control Prefixes for Parameter-Efficient Text Generation
TLDR
A dynamic method, C ON TROL P REFIXES, is proposed, which allows for the inclu-sion of conditional input-dependent information, combining the benefits of prompt tuning and controlled generation, and can even outperform fulltuning methods.
P-Tuning v2: Prompt Tuning Can Be Comparable to Fine-tuning Universally Across Scales and Tasks
TLDR
P-Tuning v2 is a novel empirical finding that properly-optimized prompt tuning can be universally effective across a wide range of model scales and NLU tasks, where it matches the performance of finetuning while having only 0.1%-3% tuned parameters.
P-Tuning: Prompt Tuning Can Be Comparable to Fine-tuning Across Scales and Tasks
TLDR
The method P-Tuning v2 is an implementation of Deep Prompt Tuning (CITATION) optimized and adapted for NLU and can serve as an alternative to finetuning and a strong baseline for future research.
Context-Tuning: Learning Contextualized Prompts for Natural Language Generation
TLDR
A novel continuous prompting approach, called Context-Tuning, to fine-tuning PLMs for natural language generation by modeling an inverse generation process from output to input and utilizing continuous inverse prompting to refine the process of natural languagegeneration.
Unfreeze with Care: Space-Efficient Fine-Tuning of Semantic Parsing Models
TLDR
While prefix tuning is shown to do poorly for semantic parsing tasks off the shelf, it is modified by adding special token embeddings, which results in very strong performance without compromising parameter savings.
HyperPELT: Unified Parameter-Efficient Language Model Tuning for Both Language and Vision-and-Language Tasks
TLDR
A novel unified parameterefficient transfer learning framework that works effectively on both pure language and V&L tasks and adds fewer trainable parameters in multi-task learning while achieves superior performances and transfer ability compared to state-of-the-art methods.
Control Prefixes for Text Generation
TLDR
A dynamic method, CONTROL PREFIXES, which allows for the inclusion of conditional input-dependent information in each prompt, at the intersection of prompt learning and controlled generation, empowering the model to have finer-grained control during text generation.
Discourse-Aware Prompt Design for Text Generation
TLDR
This work shows that prompt based conditional text generation can be improved with simple and efficient methods that simulate modeling the discourse structure of human written text, and proposes sparse prefix tuning by introducing attention sparsity on the prefix parameters at different layers of the network and learn sparse transformations on the softmax-function.
Delta Tuning: A Comprehensive Study of Parameter Efficient Methods for Pre-trained Language Models
TLDR
Though initially proposed as an efficient method to steer large models, some of the fascinating evidence discovered along with delta tuning could help further reveal the mechanisms of PLMs and even deep neural networks.
In-Style Prefix-Tuned NLG of Emails
  • Computer Science
  • 2022
TLDR
This paper proposes the application of Li and Liang’s lighter-weight method of prefix-tuning, which steers language models towards betters task-specific outputs by optimizing a small trainable module on top, the state of which is prefixed to the every LM input.
...
...

References

SHOWING 1-10 OF 57 REFERENCES
How fine can fine-tuning be? Learning efficient language models
TLDR
Fine-tuning of huge language models can be achieved by simply setting a certain number of entries in certain layers of the pre-trained parameters to zero, saving both task-specific parameter storage and computational cost.
The Power of Scale for Parameter-Efficient Prompt Tuning
TLDR
This work explores “prompt tuning”, a simple yet effective mechanism for learning “soft prompts” to condition frozen language models to perform specific downstream tasks, and shows that conditioning a frozen model with soft prompts confers benefits in robustness to domain transfer, as compared to full model tuning.
Parameter-Efficient Transfer Learning for NLP
TLDR
To demonstrate adapter's effectiveness, the recently proposed BERT Transformer model is transferred to 26 diverse text classification tasks, including the GLUE benchmark, and adapter attain near state-of-the-art performance, whilst adding only a few parameters per task.
Learning How to Ask: Querying LMs with Mixtures of Soft Prompts
TLDR
This work explores the idea of learning prompts by gradient descent—either fine-tuning prompts taken from previous work, or starting from random initialization, showing that the implicit factual knowledge in language models was previously underestimated.
Language Models are Few-Shot Learners
TLDR
GPT-3 achieves strong performance on many NLP datasets, including translation, question-answering, and cloze tasks, as well as several tasks that require on-the-fly reasoning or domain adaptation, such as unscrambling words, using a novel word in a sentence, or performing 3-digit arithmetic.
Recipes for Adapting Pre-trained Monolingual and Multilingual Models to Machine Translation
TLDR
The benefits and drawbacks of freezing parameters, and adding new ones, when fine-tuning a pre-trained model on Machine Translation (MT), are investigated.
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
TLDR
This systematic study compares pre-training objectives, architectures, unlabeled datasets, transfer approaches, and other factors on dozens of language understanding tasks and achieves state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more.
Exploring Versatile Generative Language Model Via Parameter-Efficient Transfer Learning
TLDR
The experiments show that by just using an additional 2-3% parameters for each task, the model can maintain or even improve the performance of fine-tuning the whole model.
GPT Understands, Too
TLDR
It is shown that GPTs can be better than or comparable to similar-sized BERTs on NLU tasks with a novel method P-tuning— which employs trainable continuous prompt embeddings and outperforms the state-of-the-art approaches on the few-shot SuperGlue benchmark.
Incorporating BERT into Neural Machine Translation
TLDR
A new algorithm named BERT-fused model is proposed, in which BERT is first used to extract representations for an input sequence, and then the representations are fused with each layer of the encoder and decoder of the NMT model through attention mechanisms.
...
...