MetaICL: Learning to Learn In Context

@article{Min2021MetaICLLT,
  title={MetaICL: Learning to Learn In Context},
  author={Sewon Min and Mike Lewis and Luke Zettlemoyer and Hannaneh Hajishirzi},
  journal={ArXiv},
  year={2021},
  volume={abs/2110.15943}
}
We introduce MetaICL (Meta-training for In-Context Learning), a new meta-training framework for few-shot learning where a pretrained language model is tuned to do in-context learning on a large set of training tasks. This meta-training enables the model to more effectively learn a new task in context at test time, by simply conditioning on a few training examples with no parameter updates or task-specific templates. We experiment on a large, diverse collection of tasks consisting of 142 NLP… 

Meta-learning via Language Model In-context Tuning

ICT leverages the inductive bias of pre-trained LMs to perform pattern matching, and outperforms MAML by an absolute 6% average AUC-ROC score on BinaryClfs, gaining more advantage with increasing model size.

In-context Learning Distillation: Transferring Few-shot Learning Ability of Pre-trained Language Models

This work proposes to combine in- context learning objectives with language modeling objectives to distill both the ability to read in-context examples and task knowledge to the smaller models and shows consistent improvements for both Meta-ICT and Multitask-ICT on two benchmarks.

Boosting Natural Language Generation from Instructions with Meta-Learning

This paper proposes to adapt meta-learning to MTIL in three directions: 1) Model Agnostic Meta Learning (MAML), 2) Hyper-Network based adaptation to generate task specific parameters conditioned on instructions, and 3) an approach combining HNet and MAML.

Preserving In-Context Learning ability in Large Language Model Fine-tuning

ProMoT is proposed, a simple yet effective two-stage fine-tuning framework that preserves in-context abilities of the pretrained model and shows remarkable generalization ability on tasks that have different formats, e.g. natural language inference and English-French translation.

On the Effect of Pretraining Corpora on In-context Learning by a Large-scale Language Model

This in-depth investigation of the effects of the source and size of the pretraining corpus on in-context learning in HyperCLOVA, a Korean-centric GPT-3 model finds that in- context learning performance heavily depends on the corpus domain source.

Guess the Instruction! Flipped Learning Makes Language Models Stronger Zero-Shot Learners

FLIPPED gives particularly large improvements on unseen labels, outperforming T0-11B by up to +20% average F1 score, indicating that the strong task generalization of FLIPPED comes from improved generalization to novel labels.

Rethinking the Role of Demonstrations: What Makes In-Context Learning Work?

This paper shows that ground truth demonstrations are in fact not required and that other aspects of the demonstrations are the key drivers of end task performance, including the fact that they provide a few examples of the label space, the distribution of the input text, and the overall format of the sequence.

What is Not in the Context? Evaluation of Few-shot Learners with Informative Demonstrations

This work argues that the commonly-used evaluation settings of few-shot models utilizing a random selection of in-context demonstrations is not able to disentangle models’ ability of learning new skills from demonstrations, as most of such-selected demonstrations are commonly not informative for prediction beyond exposing the new task’s input and output distribution.

Improving In-Context Few-Shot Learning via Self-Supervised Training

This paper proposes to use self-supervision in an intermediate training stage between pretraining and downstream few-shot usage with the goal to teach the model to perform in-context few shot learning.

Finetuned Language Models Are Zero-Shot Learners

It is shown that instruction tuning —finetuning language models on a collection of datasets described via instructions—substantially improves zero-shot performance on unseen tasks and outperforms few-shot GPT-3 by a large margin on ANLI, RTE, BoolQ, AI2-ARC, OpenbookQA, and StoryCloze.
...

References

SHOWING 1-10 OF 136 REFERENCES

Meta-learning via Language Model In-context Tuning

ICT leverages the inductive bias of pre-trained LMs to perform pattern matching, and outperforms MAML by an absolute 6% average AUC-ROC score on BinaryClfs, gaining more advantage with increasing model size.

CrossFit: A Few-shot Learning Challenge for Cross-task Generalization in NLP

This paper presents the NLP Few-shot Gym, a repository of 160 diverse few-shot NLP tasks created from open-access NLP datasets and converted to a unified text-to-text format, and reveals that the few- shot learning ability on unseen tasks can be improved via an upstream learning stage using a set of seen tasks.

Muppet: Massive Multi-task Representations with Pre-Finetuning

It is shown that pre-finetuning consistently improves performance for pretrained discriminators and generation models on a wide range of tasks while also significantly improving sample efficiency during fine-tuning, and that large-scale multi-tasking is crucial.

Multitask Prompted Training Enables Zero-Shot Task Generalization

A system for easily mapping any natural language tasks into a human-readable prompted form and fine-tune a pretrained encoder-decoder model on this multitask mixture covering a wide variety of tasks.

Language Models are Unsupervised Multitask Learners

It is demonstrated that language models begin to learn these tasks without any explicit supervision when trained on a new dataset of millions of webpages called WebText, suggesting a promising path towards building language processing systems which learn to perform tasks from their naturally occurring demonstrations.

Finetuned Language Models Are Zero-Shot Learners

It is shown that instruction tuning —finetuning language models on a collection of datasets described via instructions—substantially improves zero-shot performance on unseen tasks and outperforms few-shot GPT-3 by a large margin on ANLI, RTE, BoolQ, AI2-ARC, OpenbookQA, and StoryCloze.

Calibrate Before Use: Improving Few-Shot Performance of Language Models

This work first estimates the model's bias towards each answer by asking for its prediction when given the training prompt and a content-free test input such as "N/A", and then fits calibration parameters that cause the prediction for this input to be uniform across answers.

GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding

A benchmark of nine diverse NLU tasks, an auxiliary dataset for probing models for understanding of specific linguistic phenomena, and an online platform for evaluating and comparing models, which favors models that can represent linguistic knowledge in a way that facilitates sample-efficient learning and effective knowledge-transfer across tasks.

True Few-Shot Learning with Language Models

This work evaluates the few-shot ability of LMs when such held-out examples are unavailable, a setting the authors call true few- shot learning, and suggests that prior work significantly overestimated thetrue few-shots ability ofLMs given the difficulty of few-Shot model selection.

How Context Affects Language Models' Factual Predictions

This paper reports that augmenting pre-trained language models in this way dramatically improves performance and that the resulting system, despite being unsupervised, is competitive with a supervised machine reading baseline.
...