• Corpus ID: 237416585

Finetuned Language Models Are Zero-Shot Learners

@article{Wei2021FinetunedLM,
  title={Finetuned Language Models Are Zero-Shot Learners},
  author={Jason Wei and Maarten Bosma and Vincent Zhao and Kelvin Guu and Adams Wei Yu and Brian Lester and Nan Du and Andrew M. Dai and Quoc V. Le},
  journal={ArXiv},
  year={2021},
  volume={abs/2109.01652}
}
A BSTRACT This paper explores a simple method for improving the zero-shot learning abilities of language models. We show that instruction tuning —finetuning language models on a collection of datasets described via instructions—substantially improves zero-shot performance on unseen tasks. We take a 137B parameter pretrained language model and instruction tune it on over 60 NLP datasets verbalized via natural language instruction templates. We evaluate this instruction-tuned model, which we call… 
Generating Training Data with Language Models: Towards Zero-Shot Language Understanding
TLDR
This paper presents a simple approach that uses both types of PLMs for fully zero-shot learning of NLU tasks without requiring any taskspecific data: a unidirectional PLM generates class-conditioned texts guided by prompts, which are used as the training data for fine-tuning a bidirectionalPLM.
Continual-T0: Progressively Instructing 50+ Tasks to Language Models Without Forgetting
TLDR
The resulting model Continual-T0 (CT0) is able to learn diverse new tasks, while still maintaining good performance on previous tasks, spanning remarkably through 70 datasets in total.
DeepStruct: Pretraining of Language Models for Structure Prediction
TLDR
It is shown that a 10B parameter language model transfers non-trivially to most tasks and obtains state-of-the-art performance on 21 of 28 datasets that are evaluated.
Improving Zero and Few-shot Generalization in Dialogue through Instruction Tuning
TLDR
This work introduces I NSTRUCT D IAL, an instruction tuning framework for dialogue, which consists of a repository of 48 diverse dialogue tasks in a unified text-to-text format created from 59 openly available dialogue datasets, and reveals that it enables good zero-shot performance on unseen datasets and tasks such as dialogue evaluation and intent detection, and even better performance in a few-shot setting.
Few-shot Adaptation Works with UnpredicTable Data
TLDR
This work automatically extracting 413,299 tasks from internet tables - orders of magnitude more than the next-largest public datasets - and finds that narrow subsets of the authors' dataset sometimes outperform more diverse datasets.
Prompt Consistency for Zero-Shot Task Generalization
TLDR
This paper takes advantage of the fact that multiple prompts can be used to specify a single task, and proposes to regularize prompt consistency, encouraging consistent predictions over this diverse set of prompts, to improve zero-shot performance.
Overcoming Catastrophic Forgetting in Zero-Shot Cross-Lingual Generation
TLDR
Three approaches to improve cross-lingual transfer are explored, finding that mixing in unlabeled multilingual data, pre-training prompts on target language data, and explicitly factoring prompts into recombinable language and task components can provide further quality gains, suggesting that robust zero-shot cross-lingsual generation is within reach.
Language Models in the Loop: Incorporating Prompting into Weak Supervision
TLDR
The experimental evaluation shows that prompting large language models within a weak supervision framework can provide gains in accuracy, and that this approach produces classifiers with comparable or superior accuracy to those trained from hand-engineered rules.
PaLM: Scaling Language Modeling with Pathways
TLDR
A 540-billion parameter, densely activated, Transformer language model, which is called PaLM achieves breakthrough performance, outperforming the state-of-the-art on a suite of multi-step reasoning tasks, and outperforming average human performance on the recently released BIG-bench benchmark.
Cross-Task Generalization via Natural Language Crowdsourcing Instructions
TLDR
This work introduces NATURAL INSTRUCTIONS, a dataset of 61 distinct tasks, their human-authored instructions, and 193k task instances, and adopts generative pre-trained language models to encode task-specific instructions along with input and generate task output.
...
...

References

SHOWING 1-10 OF 187 REFERENCES
Language Models are Unsupervised Multitask Learners
TLDR
It is demonstrated that language models begin to learn these tasks without any explicit supervision when trained on a new dataset of millions of webpages called WebText, suggesting a promising path towards building language processing systems which learn to perform tasks from their naturally occurring demonstrations.
Making Pre-trained Language Models Better Few-shot Learners
TLDR
The LM-BFF approach makes minimal assumptions on task resources and domain expertise, and hence constitutes a strong task-agnostic method for few-shot learning.
Language Models are Few-Shot Learners
TLDR
GPT-3 achieves strong performance on many NLP datasets, including translation, question-answering, and cloze tasks, as well as several tasks that require on-the-fly reasoning or domain adaptation, such as unscrambling words, using a novel word in a sentence, or performing 3-digit arithmetic.
Multitask Prompted Training Enables Zero-Shot Task Generalization
TLDR
A system for easily mapping any natural language tasks into a human-readable prompted form and fine-tune a pretrained encoder-decoder model on this multitask mixture covering a wide variety of tasks.
Improving Language Understanding by Generative Pre-Training
TLDR
The general task-agnostic model outperforms discriminatively trained models that use architectures specifically crafted for each task, improving upon the state of the art in 9 out of the 12 tasks studied.
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
TLDR
This systematic study compares pre-training objectives, architectures, unlabeled datasets, transfer approaches, and other factors on dozens of language understanding tasks and achieves state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more.
GLaM: Efficient Scaling of Language Models with Mixture-of-Experts
TLDR
This paper proposes and develops a family of language models named GLaM, which uses a sparsely activated mixture-of-experts architecture to scale the model capacity while also incurring substantially less training cost compared to dense variants.
Towards Zero-Label Language Learning
TLDR
This paper presents a training data creation procedure named Unsupervised Data Generation (UDG), which leverages fewshot prompts to synthesize high-quality training data without real human annotations, achieving new state-of-the-art results on the SuperGLUE benchmark1.
Training language models to follow instructions with human feedback
TLDR
The results show that fine-tuning with human feedback is a promising direction for aligning language models with human intent and showing improvements in truthfulness and reductions in toxic output generation while having minimal performance regressions on public NLP datasets.
CrossFit: A Few-shot Learning Challenge for Cross-task Generalization in NLP
TLDR
This paper presents the NLP Few-shot Gym, a repository of 160 diverse few-shot NLP tasks created from open-access NLP datasets and converted to a unified text-to-text format, and reveals that the few- shot learning ability on unseen tasks can be improved via an upstream learning stage using a set of seen tasks.
...
...