• Corpus ID: 237416585

Finetuned Language Models Are Zero-Shot Learners

  title={Finetuned Language Models Are Zero-Shot Learners},
  author={Jason Wei and Maarten Bosma and Vincent Zhao and Kelvin Guu and Adams Wei Yu and Brian Lester and Nan Du and Andrew M. Dai and Quoc V. Le},
A BSTRACT This paper explores a simple method for improving the zero-shot learning abilities of language models. We show that instruction tuning —finetuning language models on a collection of datasets described via instructions—substantially improves zero-shot performance on unseen tasks. We take a 137B parameter pretrained language model and instruction tune it on over 60 NLP datasets verbalized via natural language instruction templates. We evaluate this instruction-tuned model, which we call… 

Fine-tuned Language Models are Continual Learners

It is shown that Fine-tuned Language Models can be continual learners and that Continual Learning emerges from self-supervision pre-training, demonstrating some level of instruction compositionality.

Zero-Shot Learners for Natural Language Understanding via a Unified Multiple Choice Perspective

This work proposes a new paradigm for zero-shot learners that is format agnostic, i.e., it is compatible with any format and applicable to a list of language tasks, such as text classification, commonsense reasoning, coreference resolution, and sentiment analysis, and significantly reduces the number of parameters.

Generating Training Data with Language Models: Towards Zero-Shot Language Understanding

This paper presents a simple approach that uses both types of PLMs for fully zero-shot learning of NLU tasks without requiring any task-specific data: a unidirectional PLM generates class-conditioned texts guided by prompts, which are used as the training data for a bidirectionalPLM.

Boosting Natural Language Generation from Instructions with Meta-Learning

This paper proposes to adapt meta-learning to MTIL in three directions: 1) Model Agnostic Meta Learning (MAML), 2) Hyper-Network based adaptation to generate task specific parameters conditioned on instructions, and 3) an approach combining HNet and MAML.

Guess the Instruction! Flipped Learning Makes Language Models Stronger Zero-Shot Learners

FLIPPED gives particularly large improvements on unseen labels, outperforming T0-11B by up to +20% average F1 score, indicating that the strong task generalization of FLIPPED comes from improved generalization to novel labels.

DeepStruct: Pretraining of Language Models for Structure Prediction

It is shown that a 10B parameter language model transfers non-trivially to most tasks and obtains state-of-the-art performance on 21 of 28 datasets that are evaluated.

Improving Zero and Few-shot Generalization in Dialogue through Instruction Tuning

This work introduces I NSTRUCT D IAL, an instruction tuning framework for dialogue, which consists of a repository of 48 diverse dialogue tasks in a unified text-to-text format created from 59 openly available dialogue datasets, and reveals that it enables good zero-shot performance on unseen datasets and tasks such as dialogue evaluation and intent detection, and even better performance in a few-shot setting.

Learning Instructions with Unlabeled Data for Zero-Shot Cross-Task Generalization

This work empirically explores the IT performance trends versus the number of labeled data, instructions, and training tasks, and comprehensively analyzes the key factors of UDIT to investigate how to better improve IT with unlabeled data.

ZeroGen: Efficient Zero-shot Learning via Dataset Generation

It is argued that Z ERO G EN can also provide useful insights from the perspec-tive of data-free model-agnostic knowledge distillation, and unreferenced text generation evaluation, as well as being annotation-free and efficient.

Overcoming Catastrophic Forgetting in Zero-Shot Cross-Lingual Generation

Experiments show that parameter-efficient prompt tuning provides gains over standard prompt tuning when transferring between less-related languages, e.g., from English to Thai, suggesting that robust zero-shot cross-lingual generation is within reach.



Language Models are Unsupervised Multitask Learners

It is demonstrated that language models begin to learn these tasks without any explicit supervision when trained on a new dataset of millions of webpages called WebText, suggesting a promising path towards building language processing systems which learn to perform tasks from their naturally occurring demonstrations.

Making Pre-trained Language Models Better Few-shot Learners

The LM-BFF approach makes minimal assumptions on task resources and domain expertise, and hence constitutes a strong task-agnostic method for few-shot learning.

Language Models are Few-Shot Learners

GPT-3 achieves strong performance on many NLP datasets, including translation, question-answering, and cloze tasks, as well as several tasks that require on-the-fly reasoning or domain adaptation, such as unscrambling words, using a novel word in a sentence, or performing 3-digit arithmetic.

Multitask Prompted Training Enables Zero-Shot Task Generalization

A system for easily mapping any natural language tasks into a human-readable prompted form and fine-tune a pretrained encoder-decoder model on this multitask mixture covering a wide variety of tasks.

Improving Language Understanding by Generative Pre-Training

The general task-agnostic model outperforms discriminatively trained models that use architectures specifically crafted for each task, improving upon the state of the art in 9 out of the 12 tasks studied.

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

This systematic study compares pre-training objectives, architectures, unlabeled datasets, transfer approaches, and other factors on dozens of language understanding tasks and achieves state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more.

GLaM: Efficient Scaling of Language Models with Mixture-of-Experts

This paper proposes and develops a family of language models named GLaM, which uses a sparsely activated mixture-of-experts architecture to scale the model capacity while also incurring substantially less training cost compared to dense variants.

Towards Zero-Label Language Learning

This paper presents a training data creation procedure named Unsupervised Data Generation (UDG), which leverages fewshot prompts to synthesize high-quality training data without real human annotations, achieving new state-of-the-art results on the SuperGLUE benchmark1.

Training language models to follow instructions with human feedback

The results show that fine-tuning with human feedback is a promising direction for aligning language models with human intent and showing improvements in truthfulness and reductions in toxic output generation while having minimal performance regressions on public NLP datasets.

CrossFit: A Few-shot Learning Challenge for Cross-task Generalization in NLP

This paper presents the NLP Few-shot Gym, a repository of 160 diverse few-shot NLP tasks created from open-access NLP datasets and converted to a unified text-to-text format, and reveals that the few- shot learning ability on unseen tasks can be improved via an upstream learning stage using a set of seen tasks.