Making Pre-trained Language Models Better Few-shot Learners

  title={Making Pre-trained Language Models Better Few-shot Learners},
  author={Tianyu Gao and Adam Fisch and Danqi Chen},
The recent GPT-3 model (Brown et al., 2020) achieves remarkable few-shot performance solely by leveraging a natural-language prompt and a few task demonstrations as input context. Inspired by their findings, we study few-shot learning in a more practical scenario, where we use smaller language models for which fine-tuning is computationally efficient. We present LM-BFF—better few-shot fine-tuning of language models—a suite of simple and complementary techniques for fine-tuning language models… 

Prompting for Multimodal Hateful Meme Classification

This work proposes PromptHate, a simple yet effective prompt-based model that prompts pre-trained language models (PLMs) for hateful meme classification, and constructs simple prompts and provides a few in-context examples to exploit the implicit knowledge in the pre- trained RoBERTa language model for hateful memes classification.

Investigating the Characteristics of a Transformer in a Few-Shot Setup: Does Freezing Layers in RoBERTa Help?

It is discovered that freezing initial 50% Transformer layers not only reduces training time but also surprisingly improves Macro F1 (upto 8%) when compared to fully trainable layers in few-shot setup and can be generalized to state-of-the-art few- shot text classification techniques, leading to significant reduction in training time while maintaining comparable performance.

PromptBoosting: Black-Box Text Classification with Ten Forward Passes


Beyond prompting: Making Pre-trained Language Models Better Zero-shot Learners by Clustering Representations

It is shown that zero-shot text classification can be improved simply by clustering texts in the embedding spaces of PLMs, indicating that PLM embeddings can categorize texts without task-specific fine-tuning, thus providing a new way to analyze and utilize their knowledge and zero- shot learning ability.

Cold-Start Data Selection for Few-shot Language Model Fine-tuning: A Prompt-Based Uncertainty Propagation Approach

A prompt-based uncertainty propagation approach to estimate the importance of data points and a partition-then-rewrite (P TR) strategy to promote sample diversity when querying for annotations are designed.

IDIAPers @ Causal News Corpus 2022: Efficient Causal Relation Identification Through a Prompt-based Few-shot Approach

This paper addresses the Causal Relation Identification (CRI) task by exploiting a set of simple yet complementary techniques for fine-tuning language models (LMs) on a few annotated examples (i.e., a few-shot configuration) in which the CRI task is treated as a masked language modeling problem (MLM).

Let Me Check the Examples: Enhancing Demonstration Learning via Explicit Imitation

Imitation DEMO nstration Learning (Imitation-Demo) is introduced to strengthen demonstration learning via explicitly imitating human review behaviour, which includes contrastive learning mechanism to concentrate on the similar demonstrations and demonstration-label re-prediction method to consolidate known knowledge.

PromptGen: Automatically Generate Prompts using Generative Models

This paper proposes a novel model PromptGen, which can automatically generate prompts conditional on the input sentence, and is the first work considering dynamic prompt generation for knowledge probing, based on a pre-trained generative model.

ZeroGen+: Self-Guided High-Quality Data Generation in Efficient Zero-Shot Learning

A noise-robust bi-level re-weighting framework which is able to learn the per-sample weights measuring the data quality without requiring any gold data is proposed.

Contrastive Learning for Prompt-based Few-shot Language Learners

A supervised contrastive framework that clusters inputs from the same class under different augmented “views” and repel the ones from different classes for better generality of models trained with only limited examples is proposed.



Language Models are Few-Shot Learners

GPT-3 achieves strong performance on many NLP datasets, including translation, question-answering, and cloze tasks, as well as several tasks that require on-the-fly reasoning or domain adaptation, such as unscrambling words, using a novel word in a sentence, or performing 3-digit arithmetic.

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

This systematic study compares pre-training objectives, architectures, unlabeled datasets, transfer approaches, and other factors on dozens of language understanding tasks and achieves state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more.

Annotating Expressions of Opinions and Emotions in Language

The manual annotation process and the results of an inter-annotator agreement study on a 10,000-sentence corpus of articles drawn from the world press are presented.

SentenceBERT: Sentence embeddings using Siamese BERTnetworks

  • Empirical Methods in Natural Language Processing and International Joint Conference on Natural Language Processing (EMNLP-
  • 2019

Language Models are Unsupervised Multitask Learners

It is demonstrated that language models begin to learn these tasks without any explicit supervision when trained on a new dataset of millions of webpages called WebText, suggesting a promising path towards building language processing systems which learn to perform tasks from their naturally occurring demonstrations.

Improving Language Understanding by Generative Pre-Training

The general task-agnostic model outperforms discriminatively trained models that use architectures specifically crafted for each task, improving upon the state of the art in 9 out of the 12 tasks studied.

Neural Network Acceptability Judgments

This paper introduces the Corpus of Linguistic Acceptability (CoLA), a set of 10,657 English sentences labeled as grammatical or ungrammatical from published linguistics literature, and trains several recurrent neural network models on acceptability classification, and finds that the authors' models outperform unsupervised models by Lau et al. (2016) on CoLA.

GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding

A benchmark of nine diverse NLU tasks, an auxiliary dataset for probing models for understanding of specific linguistic phenomena, and an online platform for evaluating and comparing models, which favors models that can represent linguistic knowledge in a way that facilitates sample-efficient learning and effective knowledge-transfer across tasks.

A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference

The Multi-Genre Natural Language Inference corpus is introduced, a dataset designed for use in the development and evaluation of machine learning models for sentence understanding and shows that it represents a substantially more difficult task than does the Stanford NLI corpus.

SQuAD: 100,000+ Questions for Machine Comprehension of Text

A strong logistic regression model is built, which achieves an F1 score of 51.0%, a significant improvement over a simple baseline (20%).