Prefix-Tuning: Optimizing Continuous Prompts for Generation

@article{Li2021PrefixTuningOC,
  title={Prefix-Tuning: Optimizing Continuous Prompts for Generation},
  author={Xiang Lisa Li and Percy Liang},
  journal={Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)},
  year={2021},
  volume={abs/2101.00190}
}
  • Xiang Lisa Li, Percy Liang
  • Published 2021
  • Computer Science
  • Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)
Fine-tuning is the de facto way of leveraging large pretrained language models for downstream tasks. However, fine-tuning modifies all the language model parameters and therefore necessitates storing a full copy for each task. In this paper, we propose prefix-tuning, a lightweight alternative to fine-tuning for natural language generation tasks, which keeps language model parameters frozen and instead optimizes a sequence of continuous task-specific vectors, which we call the prefix. Prefix… 
Control Prefixes for Parameter-Efficient Text Generation
TLDR
A dynamic method, C ON TROL P REFIXES, is proposed, which allows for the inclu-sion of conditional input-dependent information, combining the benefits of prompt tuning and controlled generation, and can even outperform fulltuning methods.
P-Tuning v2: Prompt Tuning Can Be Comparable to Fine-tuning Universally Across Scales and Tasks
TLDR
P-Tuning v2 is a novel empirical finding that properly-optimized prompt tuning can be universally effective across a wide range of model scales and NLU tasks, where it matches the performance of finetuning while having only 0.1%-3% tuned parameters.
P-Tuning: Prompt Tuning Can Be Comparable to Fine-tuning Across Scales and Tasks
Prompt tuning, which only tunes continuous prompts with a frozen language model, substantially reduces per-task storage and memory usage at training. However, in the context of NLU, prior work
Context-Tuning: Learning Contextualized Prompts for Natural Language Generation
TLDR
A novel continuous prompting approach, called Context-Tuning, to fine-tuning PLMs for natural language generation by modeling an inverse generation process from output to input and utilizing continuous inverse prompting to refine the process of natural languagegeneration.
HyperPELT: Unified Parameter-Efficient Language Model Tuning for Both Language and Vision-and-Language Tasks
TLDR
A novel unified parameterefficient transfer learning framework that works effectively on both pure language and V&L tasks and adds fewer trainable parameters in multi-task learning while achieves superior performances and transfer ability compared to state-of-the-art methods.
Discourse-Aware Prompt Design for Text Generation
TLDR
This work shows that prompt based conditional text generation can be improved with simple and efficient methods that simulate modeling the discourse structure of human written text, and proposes sparse prefix tuning by introducing attention sparsity on the prefix parameters at different layers of the network and learn sparse transformations on the softmax-function.
Control Prefixes for Text Generation
TLDR
A dynamic method, CONTROL PREFIXES, which allows for the inclusion of conditional input-dependent information in each prompt, at the intersection of prompt learning and controlled generation, empowering the model to have finer-grained control during text generation.
Delta Tuning: A Comprehensive Study of Parameter Efficient Methods for Pre-trained Language Models
TLDR
Though initially proposed as an efficient method to steer large models, some of the fascinating evidence discovered along with delta tuning could help further reveal the mechanisms of PLMs and even deep neural networks.
In-Style Prefix-Tuned NLG of Emails
  • Computer Science
  • 2022
TLDR
This paper proposes the application of Li and Liang’s lighter-weight method of prefix-tuning, which steers language models towards betters task-specific outputs by optimizing a small trainable module on top, the state of which is prefixed to the every LM input.
Prompt-free and Efficient Few-shot Learning with Language Models
TLDR
P ERFECT is a simple and efficient method for few-shot fine-tuning of PLMs without relying on any such handcrafting, which is highly effective given as few as 32 data points and outperforms existing state-of-the-art few- shot learning methods.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 57 REFERENCES
How fine can fine-tuning be? Learning efficient language models
TLDR
Fine-tuning of huge language models can be achieved by simply setting a certain number of entries in certain layers of the pre-trained parameters to zero, saving both task-specific parameter storage and computational cost.
The Power of Scale for Parameter-Efficient Prompt Tuning
TLDR
This work explores “prompt tuning”, a simple yet effective mechanism for learning “soft prompts” to condition frozen language models to perform specific downstream tasks, and shows that conditioning a frozen model with soft prompts confers benefits in robustness to domain transfer, as compared to full model tuning.
Parameter-Efficient Transfer Learning for NLP
TLDR
To demonstrate adapter's effectiveness, the recently proposed BERT Transformer model is transferred to 26 diverse text classification tasks, including the GLUE benchmark, and adapter attain near state-of-the-art performance, whilst adding only a few parameters per task.
Learning How to Ask: Querying LMs with Mixtures of Soft Prompts
TLDR
This work explores the idea of learning prompts by gradient descent—either fine-tuning prompts taken from previous work, or starting from random initialization, showing that the implicit factual knowledge in language models was previously underestimated.
Recipes for Adapting Pre-trained Monolingual and Multilingual Models to Machine Translation
TLDR
The benefits and drawbacks of freezing parameters, and adding new ones, when fine-tuning a pre-trained model on Machine Translation (MT), are investigated.
Incorporating BERT into Neural Machine Translation
TLDR
A new algorithm named BERT-fused model is proposed, in which BERT is first used to extract representations for an input sequence, and then the representations are fused with each layer of the encoder and decoder of the NMT model through attention mechanisms.
Plug and Play Language Models: A Simple Approach to Controlled Text Generation
TLDR
The Plug and Play Language Model (PPLM) for controllable language generation is proposed, which combines a pretrained LM with one or more simple attribute classifiers that guide text generation without any further training of the LM.
Intrinsic Dimensionality Explains the Effectiveness of Language Model Fine-Tuning
TLDR
This paper empirically shows that common pre-trained models have a very low intrinsic dimension, and connects intrinsic dimensionality with low dimensional task representations and compression based generalization bounds to provide intrinsic-dimension-based generalizations bounds that are independent of the full parameter count.
Text Summarization with Pretrained Encoders
TLDR
This paper introduces a novel document-level encoder based on BERT which is able to express the semantics of a document and obtain representations for its sentences and proposes a new fine-tuning schedule which adopts different optimizers for the encoder and the decoder as a means of alleviating the mismatch between the two.
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
TLDR
A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.
...
1
2
3
4
5
...