• Corpus ID: 239009558

SPoT: Better Frozen Model Adaptation through Soft Prompt Transfer

  title={SPoT: Better Frozen Model Adaptation through Soft Prompt Transfer},
  author={Tu Vu and Brian Lester and Noah Constant and Rami Al-Rfou and Daniel Matthew Cer},
As pre-trained language models have gotten larger, there has been growing interest in parameter-efficient methods to apply these models to downstream tasks. Building on the PROMPTTUNING approach of Lester et al. (2021), which learns task-specific soft prompts to condition a frozen language model to perform downstream tasks, we propose a novel prompt-based transfer learning approach called SPOT: Soft Prompt Transfer. SPOT first learns a prompt on one or more source tasks and then uses it to… 

Figures and Tables from this paper

On Transferability of Prompt Tuning for Natural Language Understanding
The findings show that improving PT with knowledge transfer is possible and promising, while prompts’ crosstask transferability is generally better than the cross-model transferability.
OpenPrompt: An Open-source Framework for Prompt-learning
OpenPrompt is a research-friendly framework that is equipped with efficiency, modularity, and extendibility, and its combinability allows the freedom to combine different PLMs, task formats, and prompting modules in a unified paradigm.
A Survey on Green Deep Learning
This paper focuses on presenting a systematic review of the development of Green deep learning technologies, and classifies these approaches into four categories: (1) compact networks, (2) energy-efficient training strategies, (3)Energy-efficient inference approaches, and (4) efficient data usage.


PPT: Pre-trained Prompt Tuning for Few-shot Learning
It is found that prompt tuning performs comparably with conventional full-model fine-tuning when downstream data are sufficient, whereas it performs much worse under few-shot learning settings, which may hinder the application of prompt tuning in practice.
The Power of Scale for Parameter-Efficient Prompt Tuning
This work explores “prompt tuning”, a simple yet effective mechanism for learning “soft prompts” to condition frozen language models to perform specific downstream tasks, and shows that conditioning a frozen model with soft prompts confers benefits in robustness to domain transfer, as compared to full model tuning.
Making Pre-trained Language Models Better Few-shot Learners
The LM-BFF approach makes minimal assumptions on task resources and domain expertise, and hence constitutes a strong task-agnostic method for few-shot learning.
Learning How to Ask: Querying LMs with Mixtures of Soft Prompts
This work explores the idea of learning prompts by gradient descent—either fine-tuning prompts taken from previous work, or starting from random initialization, showing that the implicit factual knowledge in language models was previously underestimated.
Intermediate-Task Transfer Learning with Pretrained Language Models: When and Why Does It Work?
It is observed that intermediate tasks requiring high-level inference and reasoning abilities tend to work best and that target task performance is strongly correlated with higher-level abilities such as coreference resolution, but it is failed to observe more granular correlations between probing and target taskperformance.
Parameter-Efficient Transfer Learning for NLP
To demonstrate adapter's effectiveness, the recently proposed BERT Transformer model is transferred to 26 diverse text classification tasks, including the GLUE benchmark, and adapter attain near state-of-the-art performance, whilst adding only a few parameters per task.
Prefix-Tuning: Optimizing Continuous Prompts for Generation
Prefix-tuning is proposed, a lightweight alternative to fine- Tuning for natural language generation tasks, which keeps language model parameters frozen, but optimizes a small continuous task-specific vector (called the prefix).
LoRA: Low-Rank Adaptation of Large Language Models
Low-Rank Adaptation, or LoRA, is proposed, which freezes the pre-trained model weights and injects trainable rank decomposition matrices into each layer of the Transformer architecture, greatly reducing the number of trainable parameters for downstream tasks.
Can You Tell Me How to Get Past Sesame Street? Sentence-Level Pretraining Beyond Language Modeling
The first large-scale systematic study of candidate pretraining tasks, comparing 19 different tasks both as alternatives and complements to language modeling shows primary results support the use language modeling, especially when combined with pretraining on additional labeled-data tasks.
Sentence Encoders on STILTs: Supplementary Training on Intermediate Labeled-data Tasks
The benefits of supplementary training with further training on data-rich supervised tasks, such as natural language inference, obtain additional performance improvements on the GLUE benchmark, as well as observing reduced variance across random restarts in this setting.