• Corpus ID: 237353222

Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners

  title={Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners},
  author={Ningyu Zhang and Luoqiu Li and Xiang Chen and Shumin Deng and Zhen Bi and Chuanqi Tan and Fei Huang and Huajun Chen},
Large-scale pre-trained language models have contributed significantly to natural language processing by demonstrating remarkable abilities as few-shot learners. However, their effectiveness depends mainly on scaling the model parameters and prompt design, hindering their implementation in most real-world applications. This study proposes a novel pluggable, extensible, and efficient approach named DifferentiAble pRompT (DART), which can convert small language models into better few-shot… 

Figures and Tables from this paper

Generating Training Data with Language Models: Towards Zero-Shot Language Understanding

This paper presents a simple approach that uses both types of PLMs for fully zero-shot learning of NLU tasks without requiring any task-specific data: a unidirectional PLM generates class-conditioned texts guided by prompts, which are used as the training data for fine-tuning a bidirectionalPLM.

Contrastive Demonstration Tuning for Pre-trained Language Models

A novel pluggable, extensible, and efficient approach named contrastive demonstration tuning, which is free of demonstration sampling is proposed, which can be plugged into any previous prompt-tuning approaches and extended to widespread classification tasks with a large number of categories.

Tuning Language Models as Training Data Generators for Augmentation-Enhanced Few-Shot Learning

This work first tunes an autoregressive PLM on the few-shot samples and then uses it as a generator to synthesize a large amount of novel training samples which augment the original training set, achieving an overall better result across seven classification tasks of the GLUE benchmark than existing few- shot learning methods.

Pre-trained Token-replaced Detection Model as Few-shot Learner

This paper proposes a novel approach to few-shot learning with pre-trained token-replaced detection models like ELECTRA, and demonstrates that this approach outperforms few- shot learners withPre-trained masked language models in both one-sentence and two- sentence learning tasks.

A Survey of Knowledge-Enhanced Pre-trained Language Models

A comprehensive review of Knowledge-Enhanced Pre-trained Language Models (KE-PLMs) is presented to provide a clear insight into this thriving field and introduces appropriate taxonomies respectively for Natural Language Understanding (NLU) and Natural Language Generation (NLG) to highlight these two main tasks of NLP.

Don't Stop Pretraining? Make Prompt-based Fine-tuning Powerful Learner

Language models (LMs) trained on vast quantities of unlabelled data have greatly advanced the field of natural language processing (NLP). In this study, we re-visit the widely accepted notion in NLP

Few-shot Learning with Multilingual Language Models

This work trains multilingual generative language models on a corpus covering a diverse set of languages, and conducts an in-depth analysis of different multilingual prompting approaches, showing in particular that strong few-shot learning performance across languages can be achieved via cross-lingual transfer through both templates and demonstration examples.

Scalable Prompt Generation for Semi-supervised Learning with Language Models

Two methods to automatically design multiple prompts and integrate automatic verbalizer in SSL settings without sacrificing performance are proposed and the prototypical verbalizer is used to replace the manual one.

Few-shot Learning with Multilingual Generative Language Models

This work trains multilingual generative language models on a corpus covering a diverse set of languages, and shows in particular that strong few-shot learning performance across languages can be achieved via cross-lingual transfer through both templates and demonstration examples.

Prototypical Verbalizer for Prompt-based Few-shot Tuning

This work proposes the prototypical verbalizer (ProtoVerb) which is built directly from training data and demonstrates that ProtoVerb significantly outperforms current automatic verbalizers, especially when training data is extremely scarce.

Making Pre-trained Language Models Better Few-shot Learners

The LM-BFF approach makes minimal assumptions on task resources and domain expertise, and hence constitutes a strong task-agnostic method for few-shot learning.

Entailment as Few-Shot Learner

A new approach to reformulate potential NLP task into an entailment one, and then fine-tune the model with as little as 8 examples, which improves the various existing SOTA few-shot learning methods by 12\%, and yields competitive few- shot performance with 500 times larger models, such as GPT-3.

Language Models are Few-Shot Learners

GPT-3 achieves strong performance on many NLP datasets, including translation, question-answering, and cloze tasks, as well as several tasks that require on-the-fly reasoning or domain adaptation, such as unscrambling words, using a novel word in a sentence, or performing 3-digit arithmetic.

Unified Language Model Pre-training for Natural Language Understanding and Generation

A new Unified pre-trained Language Model (UniLM) that can be fine-tuned for both natural language understanding and generation tasks that compares favorably with BERT on the GLUE benchmark, and the SQuAD 2.0 and CoQA question answering tasks.

Learning How to Ask: Querying LMs with Mixtures of Soft Prompts

This work explores the idea of learning prompts by gradient descent—either fine-tuning prompts taken from previous work, or starting from random initialization, showing that the implicit factual knowledge in language models was previously underestimated.

Prompt-Learning for Fine-Grained Entity Typing

This work develops a simple and effective prompt-learning pipeline, and proposes a self-supervised strategy that carries out distribution-level optimization in prompt- learning to automatically summarize the information of entity types in the zero-shot regime.

Learning to Few-Shot Learn Across Diverse Natural Language Classification Tasks

LEOPARD is trained with the state-of-the-art transformer architecture and shows better generalization to tasks not seen at all during training, with as few as 4 examples per label, than self-supervised pre-training or multi-task training.

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

This systematic study compares pre-training objectives, architectures, unlabeled datasets, transfer approaches, and other factors on dozens of language understanding tasks and achieves state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more.

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.

PTR: Prompt Tuning with Rules for Text Classification