Continued Pretraining for Better Zero- and Few-Shot Promptability

@article{Wu2022ContinuedPF,
  title={Continued Pretraining for Better Zero- and Few-Shot Promptability},
  author={Zhaofeng Wu and IV RobertL.Logan and Pete Walsh and Akshita Bhagia and Dirk Groeneveld and Sameer Singh and Iz Beltagy},
  journal={ArXiv},
  year={2022},
  volume={abs/2210.10258}
}
Recently introduced language model prompting methods can achieve high accuracy in zero-and few-shot settings while requiring few to no learned task-specific parameters. Never-theless, these methods still often trail behind full model finetuning. In this work, we investigate if a dedicated continued pretraining stage could improve “promptability”, i.e., zero-shot performance with natural language prompts or few-shot performance with prompt tuning. We reveal settings where existing continued… 

Figures and Tables from this paper

HyperTuning: Toward Adapting Large Language Models without Back-propagation

This work proposes HyperTuning, a novel approach to model adaptation that uses a hypermodel to generate task-specific parameters for a fixed downstream model and shows that using hypermodel-generated parameters as initializations for further parameter-efflcient ffne-tuning improves performance.

References

SHOWING 1-10 OF 32 REFERENCES

PPT: Pre-trained Prompt Tuning for Few-shot Learning

This work proposes to pre-train prompts by adding soft prompts into the pre-training stage to obtain a better initialization, and names this Pre-trained Prompt Tuning framework “PPT” to ensure the generalization of PPT.

Making Pre-trained Language Models Better Few-shot Learners

The LM-BFF approach makes minimal assumptions on task resources and domain expertise, and hence constitutes a strong task-agnostic method for few-shot learning.

Cutting Down on Prompts and Parameters: Simple Few-Shot Learning with Language Models

This work shows that finetuning LMs in the few-shot setting can considerably reduce the need for prompt engineering, and recommends finetuned LMs for few- shot learning as it is more accurate, robust to different prompts, and can be made nearly as efficient as using frozen LMs.

SPoT: Better Frozen Model Adaptation through Soft Prompt Transfer

It is shown that SPoT significantly boosts the performance of Prompt Tuning across many tasks, and an efficient retrieval approach is proposed that interprets task prompts as task embeddings to identify similar tasks and predict the most transferable source tasks for a novel target task.

Multitask Prompted Training Enables Zero-Shot Task Generalization

A system for easily mapping any natural language tasks into a human-readable prompted form and fine-tune a pretrained encoder-decoder model on this multitask mixture covering a wide variety of tasks.

Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning

A new parameter-efficient fine-tuning method called (IA) 3 that scales activations by learned vectors, attaining stronger performance while only introducing a relatively tiny amount of new parameters.

The Power of Scale for Parameter-Efficient Prompt Tuning

This work explores “prompt tuning”, a simple yet effective mechanism for learning “soft prompts” to condition frozen language models to perform specific downstream tasks, and shows that conditioning a frozen model with soft prompts confers benefits in robustness to domain transfer, as compared to full model tuning.

Noisy Channel Language Model Prompting for Few-Shot Text Classification

A noisy channel approach for language model prompting in few-shot text classification by using channel models for recently proposed few- shot learning methods with no or very limited updates to the language model parameters, via either in-context demonstration or prompt tuning.

P-Tuning: Prompt Tuning Can Be Comparable to Fine-tuning Across Scales and Tasks

The method P-Tuning v2 is an implementation of Deep Prompt Tuning (CITATION) optimized and adapted for NLU and can serve as an alternative to finetuning and a strong baseline for future research.

Large Language Models are Zero-Shot Reasoners

Experimental results demonstrate that the Zero-shot-CoT, using the same single prompt template, significantly outperforms zero-shot LLM performances on diverse benchmark reasoning tasks including arithmetics, symbolic reasoning, and other logical reasoning tasks, without any hand-crafted few-shot examples.