• Corpus ID: 218971783

Language Models are Few-Shot Learners

@article{Brown2020LanguageMA,
  title={Language Models are Few-Shot Learners},
  author={Tom B. Brown and Benjamin Mann and Nick Ryder and Melanie Subbiah and Jared Kaplan and Prafulla Dhariwal and Arvind Neelakantan and Pranav Shyam and Girish Sastry and Amanda Askell and Sandhini Agarwal and Ariel Herbert-Voss and Gretchen Krueger and T. J. Henighan and Rewon Child and Aditya Ramesh and Daniel M. Ziegler and Jeff Wu and Clemens Winter and Christopher Hesse and Mark Chen and Eric Sigler and Mateusz Litwin and Scott Gray and Benjamin Chess and Jack Clark and Christopher Berner and Sam McCandlish and Alec Radford and Ilya Sutskever and Dario Amodei},
  journal={ArXiv},
  year={2020},
  volume={abs/2005.14165}
}
Recent work has demonstrated substantial gains on many NLP tasks and benchmarks by pre-training on a large corpus of text followed by fine-tuning on a specific task. While typically task-agnostic in architecture, this method still requires task-specific fine-tuning datasets of thousands or tens of thousands of examples. By contrast, humans can generally perform a new language task from only a few examples or from simple instructions - something which current NLP systems still largely struggle… 
Self-Supervised Meta-Learning for Few-Shot Natural Language Classification Tasks
TLDR
This paper proposes a self-supervised approach to generate a large, rich, meta-learning task distribution from unlabeled text, and shows that this meta-training leads to better few-shot generalization than language-model pre-training followed by finetuning.
Exploring Large Language Models in a Limited Resource Scenario
TLDR
It was observed that the proposed GPT-2 system does a good job of capturing the sentiment of a given text and reached an accuracy of 82% on a part of the IMDB Data set of Movie Reviews.
DReCa: A General Task Augmentation Strategy for Few-Shot Natural Language Inference
TLDR
DReCA (Decomposing datasets into Reasoning Categories), a simple method for discovering and using latent reasoning categories in a dataset, to form additional high quality tasks to improve the accuracy of meta-learners.
Towards Few-shot Fact-Checking via Perplexity
TLDR
A new way of utilizing the powerful transfer learning ability of a language model via a perplexity score is proposed and can already outperform the Major Class baseline by more than an absolute 10% on the F1-Macro metric across multiple datasets.
When Do You Need Billions of Words of Pretraining Data?
TLDR
While the ability to encode linguistic features is almost certainly necessary for language understanding, it is likely that other, unidentified, forms of knowledge are the major drivers of recent improvements in language understanding among large pretrained models.
Few-shot Sequence Learning with Transformers
TLDR
This work investigates few-shot learning in the setting where the data points are sequences of tokens and proposes an efficient learning algorithm based on Transformers that works at least as well as other methods, while being more computationally efficient.
Learning from Task Descriptions
TLDR
This work introduces a framework for developing NLP systems that solve new tasks after reading their descriptions, synthesizing prior work in this area, and instantiates it with a new English language dataset, ZEST, structured for task-oriented evaluation on unseen tasks.
It’s Not Just Size That Matters: Small Language Models Are Also Few-Shot Learners
TLDR
This work shows that performance similar to GPT-3 can be obtained with language models that are much “greener” in that their parameter count is several orders of magnitude smaller, and identifies key factors required for successful natural language understanding with small language models.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 146 REFERENCES
Language Models are Unsupervised Multitask Learners
TLDR
It is demonstrated that language models begin to learn these tasks without any explicit supervision when trained on a new dataset of millions of webpages called WebText, suggesting a promising path towards building language processing systems which learn to perform tasks from their naturally occurring demonstrations.
Improving Language Understanding by Generative Pre-Training
TLDR
The general task-agnostic model outperforms discriminatively trained models that use architectures specifically crafted for each task, significantly improving upon the state of the art in 9 out of the 12 tasks studied.
Universal Language Model Fine-tuning for Text Classification
TLDR
This work proposes Universal Language Model Fine-tuning (ULMFiT), an effective transfer learning method that can be applied to any task in NLP, and introduces techniques that are key for fine- Tuning a language model.
Story Ending Prediction by Transferable BERT
TLDR
This study investigates a transferable BERT (TransBERT) training framework, which can transfer not only general language knowledge from large-scale unlabeled data but also specific kinds of knowledge from various semantically related supervised tasks, for a target task.
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
TLDR
This systematic study compares pre-training objectives, architectures, unlabeled datasets, transfer approaches, and other factors on dozens of language understanding tasks and achieves state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more.
Multi-Task Deep Neural Networks for Natural Language Understanding
TLDR
A Multi-Task Deep Neural Network (MT-DNN) for learning representations across multiple natural language understanding (NLU) tasks that allows domain adaptation with substantially fewer in-domain labels than the pre-trained BERT representations.
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks
TLDR
A general-purpose fine-tuning recipe for retrieval-augmented generation (RAG) -- models which combine pre-trained parametric and non-parametric memory for language generation, and finds that RAG models generate more specific, diverse and factual language than a state-of-the-art parametric-only seq2seq baseline.
REALM: Retrieval-Augmented Language Model Pre-Training
TLDR
The effectiveness of Retrieval-Augmented Language Model pre-training (REALM) is demonstrated by fine-tuning on the challenging task of Open-domain Question Answering (Open-QA) and is found to outperform all previous methods by a significant margin, while also providing qualitative benefits such as interpretability and modularity.
Exploring the Limits of Language Modeling
TLDR
This work explores recent advances in Recurrent Neural Networks for large scale Language Modeling, and extends current models to deal with two key challenges present in this task: corpora and vocabulary sizes, and complex, long term structure of language.
...
1
2
3
4
5
...