On Transferability of Prompt Tuning for Natural Language Processing

  title={On Transferability of Prompt Tuning for Natural Language Processing},
  author={Yusheng Su and Xiaozhi Wang and Yujia Qin and Chi-Min Chan and Yankai Lin and Huadong Wang and Kaiyue Wen and Zhiyuan Liu and Peng Li and Juan-Zi Li and Lei Hou and Maosong Sun and Jie Zhou},
Prompt tuning (PT) is a promising parameter-efficient method to utilize extremely large pre-trained language models (PLMs), which can achieve comparable performance to full-parameter fine-tuning by only tuning a few soft prompts. However, PT requires much more training time than fine-tuning. Intuitively, knowledge transfer can help to improve the efficiency. To explore whether we can improve PT via prompt transfer, we empirically investigate the transferability of soft prompts across different… 
Neuro-Symbolic Causal Language Planning with Commonsense Prompting
A Neuro-Symbolic Causal Language Planner (CLAP) is proposed that elicits procedural knowledge from the LLMs with commonsense-infused prompting to solve the language planning problem in a zero-shot manner.


SPoT: Better Frozen Model Adaptation through Soft Prompt Transfer
It is shown that SPoT significantly boosts the performance of Prompt Tuning across many tasks, and an efficient retrieval approach is proposed that interprets task prompts as task embeddings to identify similar tasks and predict the most transferable source tasks for a novel target task.
Exploring and Predicting Transferability across NLP Tasks
The results show that transfer learning is more beneficial than previously thought, especially when target task data is scarce, and can improve performance even when the source task is small or differs substantially from the target task.
GPT Understands, Too
It is shown that GPTs can be better than or comparable to similar-sized BERTs on NLU tasks with a novel method P-tuning— which employs trainable continuous prompt embeddings and outperforms the state-of-the-art approaches on the few-shot SuperGlue benchmark.
The Power of Scale for Parameter-Efficient Prompt Tuning
This work explores “prompt tuning”, a simple yet effective mechanism for learning “soft prompts” to condition frozen language models to perform specific downstream tasks, and shows that conditioning a frozen model with soft prompts confers benefits in robustness to domain transfer, as compared to full model tuning.
Making Pre-trained Language Models Better Few-shot Learners
The LM-BFF approach makes minimal assumptions on task resources and domain expertise, and hence constitutes a strong task-agnostic method for few-shot learning.
Knowledge Inheritance for Pre-trained Language Models
A pre-training framework named “knowledge inheritance” (KI) is introduced and how could knowledge distillation serve as auxiliary supervision during pre- training to efficiently learn larger PLMs is explored, demonstrating the superiority of KI in training efficiency.
Learning How to Ask: Querying LMs with Mixtures of Soft Prompts
This work explores the idea of learning prompts by gradient descent—either fine-tuning prompts taken from previous work, or starting from random initialization, showing that the implicit factual knowledge in language models was previously underestimated.
Prefix-Tuning: Optimizing Continuous Prompts for Generation
Prefix-tuning is proposed, a lightweight alternative to fine- Tuning for natural language generation tasks, which keeps language model parameters frozen and instead optimizes a sequence of continuous task-specific vectors, which is called the prefix.
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
This systematic study compares pre-training objectives, architectures, unlabeled datasets, transfer approaches, and other factors on dozens of language understanding tasks and achieves state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more.
Parameter-Efficient Transfer Learning for NLP
To demonstrate adapter's effectiveness, the recently proposed BERT Transformer model is transferred to 26 diverse text classification tasks, including the GLUE benchmark, and adapter attain near state-of-the-art performance, whilst adding only a few parameters per task.