What Makes Pre-trained Language Models Better Zero/Few-shot Learners?

  title={What Makes Pre-trained Language Models Better Zero/Few-shot Learners?},
  author={Jinghui Lu and Rui Zhao and Brian Mac Namee and Dongsheng Zhu and Weidong Han and Fei Tan},
In this paper, we propose a theoretical framework to explain the efficacy of prompt learning in zero/few-shot scenarios. First, we prove that conventional pre-training and fine-tuning paradigm fails in few-shot scenarios due to overfitting the unrepresentative labelled data. We then detail the assumption that prompt learning is more effective because it empowers pre-trained language model that is built upon massive text corpora, as well as domain-related human knowledge to participate more in… 

PUnifiedNER: a Prompting-based Unified NER System for Diverse Datasets

This work presents a “versatile” model—the Prompting-based Unified NER system (PUnifiedNER)—that works with data from different domains and can recognise up to 37 entity types simultaneously, and theoretically it could be as many as possible.



Making Pre-trained Language Models Better Few-shot Learners

The LM-BFF approach makes minimal assumptions on task resources and domain expertise, and hence constitutes a strong task-agnostic method for few-shot learning.

NSP-BERT: A Prompt-based Few-Shot Learner through an Original Pre-training Task —— Next Sentence Prediction

This paper presents an NSP-tuning approach with binary cross-entropy loss for single-sentence classification tasks that is competitive compared to PET and EFL and indicates that the pre-training corpus is another important determinant of few-shot besides model size and prompt method.

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

This systematic study compares pre-training objectives, architectures, unlabeled datasets, transfer approaches, and other factors on dozens of language understanding tasks and achieves state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more.

Language Models are Few-Shot Learners

GPT-3 achieves strong performance on many NLP datasets, including translation, question-answering, and cloze tasks, as well as several tasks that require on-the-fly reasoning or domain adaptation, such as unscrambling words, using a novel word in a sentence, or performing 3-digit arithmetic.

FewCLUE: A Chinese Few-shot Learning Evaluation Benchmark

This work introduces Chinese Few-shot Learning Evaluation Benchmark (FewCLUE), the first comprehensive small sample evaluation benchmark in Chinese, and implements a set of state-of-the-art few-shot learning methods (including PET, ADAPET, LM-BFF, P-tuning and EFL), and compares their performance with fine- Tuning and zero-shotLearning schemes on the newly constructed FewCLUE benchmark.

GPT Understands, Too

It is shown that GPTs can be better than or comparable to similar-sized BERTs on NLU tasks with a novel method P-tuning— which employs trainable continuous prompt embeddings and outperforms the state-of-the-art approaches on the few-shot SuperGlue benchmark.

Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing

A unified set of mathematical notations that can cover a wide variety of existing work, and organize existing work along several dimensions, e.g. the choice of pre-trained language models, prompts, and tuning strategies are described.

The Power of Scale for Parameter-Efficient Prompt Tuning

This work explores “prompt tuning”, a simple yet effective mechanism for learning “soft prompts” to condition frozen language models to perform specific downstream tasks, and shows that conditioning a frozen model with soft prompts confers benefits in robustness to domain transfer, as compared to full model tuning.

It’s Not Just Size That Matters: Small Language Models Are Also Few-Shot Learners

This work shows that performance similar to GPT-3 can be obtained with language models that are much “greener” in that their parameter count is several orders of magnitude smaller, and identifies key factors required for successful natural language understanding with small language models.

What Makes Good In-Context Examples for GPT-3?

This work proposes to retrieve examples that are semantically-similar to a test query sample to formulate its corresponding prompt, and evaluates the proposed approach on several natural language understanding and generation benchmarks, where the retrieval-based prompt selection approach consistently outperforms the random selection baseline.