The Power of Scale for Parameter-Efficient Prompt Tuning

@article{Lester2021ThePO,
  title={The Power of Scale for Parameter-Efficient Prompt Tuning},
  author={Brian Lester and Rami Al-Rfou and Noah Constant},
  journal={ArXiv},
  year={2021},
  volume={abs/2104.08691}
}
In this work, we explore “prompt tuning,” a simple yet effective mechanism for learning “soft prompts” to condition frozen language models to perform specific downstream tasks. Unlike the discrete text prompts used by GPT-3, soft prompts are learned through backpropagation and can be tuned to incorporate signals from any number of labeled examples. Our end-to-end learned approach outperforms GPT-3’s few-shot learning by a large margin. More remarkably, through ablations on model size using T5… 
P-Tuning: Prompt Tuning Can Be Comparable to Fine-tuning Across Scales and Tasks
TLDR
The method P-Tuning v2 is an implementation of Deep Prompt Tuning (CITATION) optimized and adapted for NLU and can serve as an alternative to finetuning and a strong baseline for future research.
P-Tuning v2: Prompt Tuning Can Be Comparable to Fine-tuning Universally Across Scales and Tasks
TLDR
P-Tuning v2 is a novel empirical finding that properly-optimized prompt tuning can be universally effective across a wide range of model scales and NLU tasks, where it matches the performance of finetuning while having only 0.1%-3% tuned parameters.
SPoT: Better Frozen Model Adaptation through Soft Prompt Transfer
TLDR
It is shown that SPoT significantly boosts the performance of Prompt Tuning across many tasks, and an efficient retrieval approach is proposed that interprets task prompts as task embeddings to identify similar tasks and predict the most transferable source tasks for a novel target task.
PPT: Pre-trained Prompt Tuning for Few-shot Learning
TLDR
This work proposes to pre-train prompts by adding soft prompts into the pre-training stage to obtain a better initialization, and names this Pre-trained Prompt Tuning framework “PPT” to ensure the generalization of PPT.
Delta Tuning: A Comprehensive Study of Parameter Efficient Methods for Pre-trained Language Models
TLDR
Though initially proposed as an efficient method to steer large models, some of the fascinating evidence discovered along with delta tuning could help further reveal the mechanisms of PLMs and even deep neural networks.
Cutting Down on Prompts and Parameters: Simple Few-Shot Learning with Language Models
TLDR
This work shows that finetuning LMs in the few-shot setting can considerably reduce the need for prompt engineering, and recommends finetuned LMs for few- shot learning as it is more accurate, robust to different prompts, and can be made nearly as efficient as using frozen LMs.
No More Fine-Tuning? An Experimental Evaluation of Prompt Tuning in Code Intelligence
Pre-trained models have been shown effective in many code intelligence tasks. These models are pre-trained on large-scale unlabeled corpus and then fine-tuned in downstream tasks. However, as the
Prompt-free and Efficient Few-shot Learning with Language Models
TLDR
Experiments demonstrate that Perfect, a simple and efficient method for few-shot fine-tuning of PLMs without relying on any such handcrafting, also outperforms existing state-of-the-art few- shot learning methods.
Instance-wise Prompt Tuning for Pretrained Language Models
TLDR
Instance-wise Prompt Tuning (IPT) is introduced, the first prompt learning paradigm that injects knowledge from the input data instances to the prompts, thereby providing PLMs with richer and more concrete context information.
Parameter-Efficient Prompt Tuning Makes Generalized and Calibrated Neural Text Retrievers
TLDR
Through an extensive analysis, it is shown that the prompt tuning strategy can mit-igate the two issues—parameter-inefficiency and weak generalizability—faced by fine-tuning based retrieval methods and can improve the out-of-domain zero-shot generalization of the retrieval models.
...
...

References

SHOWING 1-10 OF 67 REFERENCES
Prefix-Tuning: Optimizing Continuous Prompts for Generation
TLDR
Prefix-tuning is proposed, a lightweight alternative to fine- Tuning for natural language generation tasks, which keeps language model parameters frozen and instead optimizes a sequence of continuous task-specific vectors, which is called the prefix.
Language Models are Few-Shot Learners
TLDR
GPT-3 achieves strong performance on many NLP datasets, including translation, question-answering, and cloze tasks, as well as several tasks that require on-the-fly reasoning or domain adaptation, such as unscrambling words, using a novel word in a sentence, or performing 3-digit arithmetic.
Learning How to Ask: Querying LMs with Mixtures of Soft Prompts
TLDR
This work explores the idea of learning prompts by gradient descent—either fine-tuning prompts taken from previous work, or starting from random initialization, showing that the implicit factual knowledge in language models was previously underestimated.
Intrinsic Dimensionality Explains the Effectiveness of Language Model Fine-Tuning
TLDR
This paper empirically shows that common pre-trained models have a very low intrinsic dimension, and connects intrinsic dimensionality with low dimensional task representations and compression based generalization bounds to provide intrinsic-dimension-based generalizations bounds that are independent of the full parameter count.
GPT Understands, Too
TLDR
It is shown that GPTs can be better than or comparable to similar-sized BERTs on NLU tasks with a novel method P-tuning— which employs trainable continuous prompt embeddings and outperforms the state-of-the-art approaches on the few-shot SuperGlue benchmark.
Parameter-Efficient Transfer Learning for NLP
TLDR
To demonstrate adapter's effectiveness, the recently proposed BERT Transformer model is transferred to 26 diverse text classification tasks, including the GLUE benchmark, and adapter attain near state-of-the-art performance, whilst adding only a few parameters per task.
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
TLDR
This systematic study compares pre-training objectives, architectures, unlabeled datasets, transfer approaches, and other factors on dozens of language understanding tasks and achieves state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more.
Exploiting Cloze-Questions for Few-Shot Text Classification and Natural Language Inference
TLDR
This work introduces Pattern-Exploiting Training (PET), a semi-supervised training procedure that reformulates input examples as cloze-style phrases to help language models understand a given task.
Improving Language Understanding by Generative Pre-Training
TLDR
The general task-agnostic model outperforms discriminatively trained models that use architectures specifically crafted for each task, improving upon the state of the art in 9 out of the 12 tasks studied.
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
TLDR
A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.
...
...