Noisy Channel Language Model Prompting for Few-Shot Text Classification

  title={Noisy Channel Language Model Prompting for Few-Shot Text Classification},
  author={Sewon Min and Michael Lewis and Hannaneh Hajishirzi and Luke Zettlemoyer},
We introduce a noisy channel approach for language model prompting in few-shot text classification. Instead of computing the likelihood of the label given the input (referred as direct models), channel models compute the conditional probability of the input given the label, and are thereby required to explain every word in the input. We use channel models for recently proposed few-shot learning methods with no or very limited updates to the language model parameters, via either in-context… 

Learning To Retrieve Prompts for In-Context Learning

This work proposes an efficient method for retrieving prompts for in-context learning using annotated data and an LM, and trains an efficient dense retriever from this data, which is used to retrieve training examples as prompts at test time.

Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners

A novel pluggable, extensible, and efficient approach named DifferentiAble pRompT (DART), which can convert small language models into better few-shot learners.

AdaPrompt: Adaptive Model Training for Prompt-based NLP

AdaPrompt is proposed, adaptively retrieving external data for continual pretraining of PLMs by making use of both task and prompt characteristics, and makes use of knowledge in Natural Language Inference models for deriving adaptive verbalizers.

Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning

A new parameter-efficient fine-tuning method called (IA) 3 that scales activations by learned vectors, attaining stronger performance while only introducing a relatively tiny amount of new parameters.

Making Pre-trained Language Models End-to-end Few-shot Learners with Contrastive Prompt Tuning

CP-Tuning is presented, the first end-to-end Contrastive Prompt Tuning framework for fine-tuning PLMs without any manual engineering of task-specific prompts and verbalizers, and it is integrated with the task-invariantcontinuous prompt encoding technique with fully trainable prompt parameters.

Semantic-Oriented Unlabeled Priming for Large-Scale Language Models

Semantic-Oriented Unlabeled Priming (SOUP) is introduced, a method that classifies examples by retrieving semantically similar unlabeled examples, assigning labels to them in a zero-shot fashion, and then using them for in-context learning.

Zero- and Few-Shot NLP with Pretrained Language Models

This tutorial aims at bringing interested NLP researchers up to speed about the recent and ongoing techniques for zero- and few-shot learning with pretrained language models.

No More Fine-Tuning? An Experimental Evaluation of Prompt Tuning in Code Intelligence

Pre-trained models have been shown effective in many code intelligence tasks. These models are pre-trained on large-scale unlabeled corpus and then fine-tuned in downstream tasks. However, as the

Adapting Document-Grounded Dialog Systems to Spoken Conversations using Data Augmentation and a Noisy Channel Model

This paper summarizes the submission to Task 2 of the second track of the 10th Dialog System Technology Challenge (DSTC10) “Knowledge-grounded Task-oriented Dialogue Modeling on Spoken Conversations” and explores different approaches to make the models more robust to this type of input and to adapt the generated responses to the style of spoken conversations.

P-Tuning v2: Prompt Tuning Can Be Comparable to Fine-tuning Universally Across Scales and Tasks

P-Tuning v2 is a novel empirical finding that properly-optimized prompt tuning can be universally effective across a wide range of model scales and NLU tasks, where it matches the performance of finetuning while having only 0.1%-3% tuned parameters.



Making Pre-trained Language Models Better Few-shot Learners

The LM-BFF approach makes minimal assumptions on task resources and domain expertise, and hence constitutes a strong task-agnostic method for few-shot learning.

Learning How to Ask: Querying LMs with Mixtures of Soft Prompts

This work explores the idea of learning prompts by gradient descent—either fine-tuning prompts taken from previous work, or starting from random initialization, showing that the implicit factual knowledge in language models was previously underestimated.

Calibrate Before Use: Improving Few-Shot Performance of Language Models

This work first estimates the model's bias towards each answer by asking for its prediction when given the training prompt and a content-free test input such as "N/A", and then fits calibration parameters that cause the prediction for this input to be uniform across answers.

True Few-Shot Learning with Language Models

This work evaluates the few-shot ability of LMs when such held-out examples are unavailable, a setting the authors call true few- shot learning, and suggests that prior work significantly overestimated thetrue few-shots ability ofLMs given the difficulty of few-Shot model selection.

Cutting Down on Prompts and Parameters: Simple Few-Shot Learning with Language Models

This work shows that finetuning LMs in the few-shot setting can considerably reduce the need for prompt engineering, and recommends finetuned LMs for few- shot learning as it is more accurate, robust to different prompts, and can be made nearly as efficient as using frozen LMs.

Fantastically Ordered Prompts and Where to Find Them: Overcoming Few-Shot Prompt Order Sensitivity

This work uses the generative nature of language models to construct an artificial development set and based on entropy statistics of the candidate permutations on this set, it identifies performant prompts and yields a 13% relative improvement for GPT-family models across eleven different established text classification tasks.

The Power of Scale for Parameter-Efficient Prompt Tuning

This work explores “prompt tuning”, a simple yet effective mechanism for learning “soft prompts” to condition frozen language models to perform specific downstream tasks, and shows that conditioning a frozen model with soft prompts confers benefits in robustness to domain transfer, as compared to full model tuning.

Language Models are Unsupervised Multitask Learners

It is demonstrated that language models begin to learn these tasks without any explicit supervision when trained on a new dataset of millions of webpages called WebText, suggesting a promising path towards building language processing systems which learn to perform tasks from their naturally occurring demonstrations.

Semi-Supervised Sequence Modeling with Cross-View Training

Cross-View Training (CVT), a semi-supervised learning algorithm that improves the representations of a Bi-LSTM sentence encoder using a mix of labeled and unlabeled data, is proposed and evaluated, achieving state-of-the-art results.

Simple and Effective Noisy Channel Modeling for Neural Machine Translation

This work pursues an alternative approach based on standard sequence to sequence models which utilize the entire source, and these models perform remarkably well as channel models, even though they have neither been trained on, nor designed to factor over incomplete target sentences.