MetaICL: Learning to Learn In Context

@article{Min2022MetaICLLT,
  title={MetaICL: Learning to Learn In Context},
  author={Sewon Min and Mike Lewis and Luke Zettlemoyer and Hannaneh Hajishirzi},
  journal={ArXiv},
  year={2022},
  volume={abs/2110.15943}
}
We introduce MetaICL (Meta-training for In-Context Learning), a new meta-training framework for few-shot learning where a pretrained language model is tuned to do in-context learning on a large set of training tasks. This meta-training enables the model to more effectively learn a new task in context at test time, by simply conditioning on a few training examples with no parameter updates or task-specific templates. We experiment on a large, diverse collection of tasks consisting of 142 NLP… 

Meta-learning via Language Model In-context Tuning

TLDR
ICT leverages the inductive bias of pre-trained LMs to perform pattern matching, and outperforms MAML by an absolute 6% average AUC-ROC score on BinaryClfs, gaining more advantage with increasing model size.

On the Effect of Pretraining Corpora on In-context Learning by a Large-scale Language Model

TLDR
This in-depth investigation of the effects of the source and size of the pretraining corpus on in-context learning in HyperCLOVA, a Korean-centric GPT-3 model finds that in- context learning performance heavily depends on the corpus domain source.

Rethinking the Role of Demonstrations: What Makes In-Context Learning Work?

TLDR
It is shown that ground truth demonstrations are in fact not required and other aspects of the demonstrations are the key drivers of end task performance, including the fact that they provide a few examples of the label space, the distribution of the input text, and the overall format of the sequence.

Improving In-Context Few-Shot Learning via Self-Supervised Training

TLDR
This paper proposes to use self-supervision in an intermediate training stage between pretraining and downstream few-shot usage with the goal to teach the model to perform in-context few shot learning.

Finetuned Language Models Are Zero-Shot Learners

TLDR
It is shown that instruction tuning —finetuning language models on a collection of datasets described via instructions—substantially improves zero-shot performance on unseen tasks and outperforms few-shot GPT-3 by a large margin on ANLI, RTE, BoolQ, AI2-ARC, OpenbookQA, and StoryCloze.

What Can Transformers Learn In-Context? A Case Study of Simple Function Classes

TLDR
It is shown empirically that standard Transformers can be trained from scratch to perform in-context learning of linear functions—that is, the trained model is able to learn unseen linear functions from in- context examples with performance comparable to the optimal least squares estimator.

Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning

TLDR
A new parameter-efficient fine-tuning method called (IA) 3 that scales activations by learned vectors, attaining stronger performance while only introducing a relatively tiny amount of new parameters.

Towards Unified Prompt Tuning for Few-shot Text Classification

TLDR
A novel paradigm Prompt-Options-Verbalizer is proposed for joint prompt learning across different NLP tasks, forcing PLMs to capture task-invariant prompting knowledge, and a self-supervised task named Knowledge-enhanced Selective Masked Language Modeling is designed to improve the PLM’s generalization abilities for accurate adaptation to previously unseen tasks.

Few-shot Learning with Multilingual Language Models

TLDR
A detailed analysis of where the model succeeds and fails is presented, showing in particular that it enables cross-lingual in-context learning on some tasks, while there is still room for improvement on surface form robustness and adaptation to tasks that do not have a natural cloze form.

In-Context Learning for Few-Shot Dialogue State Tracking

TLDR
This work proposes an in-context (IC) learning framework for zero-shot and few-shot learning dialogue state tracking (DST), where a large pretrained language model (LM) takes a test instance and a few exemplars as input, and directly decodes the dialogue state without any parameter updates.

References

SHOWING 1-10 OF 135 REFERENCES

Meta-learning via Language Model In-context Tuning

TLDR
ICT leverages the inductive bias of pre-trained LMs to perform pattern matching, and outperforms MAML by an absolute 6% average AUC-ROC score on BinaryClfs, gaining more advantage with increasing model size.

CrossFit: A Few-shot Learning Challenge for Cross-task Generalization in NLP

TLDR
This paper presents the NLP Few-shot Gym, a repository of 160 diverse few-shot NLP tasks created from open-access NLP datasets and converted to a unified text-to-text format, and reveals that the few- shot learning ability on unseen tasks can be improved via an upstream learning stage using a set of seen tasks.

Adapting Language Models for Zero-shot Learning by Meta-tuning on Dataset and Prompt Collections

TLDR
Meta-tuning is proposed, which directly optimizes the zero-shot learning objective by finetuning pre-trained language models on a collection of datasets by aggregating 43 existing datasets and annotating 441 label descriptions in a question-answering (QA) format.

Muppet: Massive Multi-task Representations with Pre-Finetuning

TLDR
It is shown that pre-finetuning consistently improves performance for pretrained discriminators and generation models on a wide range of tasks while also significantly improving sample efficiency during fine-tuning, and that large-scale multi-tasking is crucial.

Multitask Prompted Training Enables Zero-Shot Task Generalization

TLDR
A system for easily mapping any natural language tasks into a human-readable prompted form and fine-tune a pretrained encoder-decoder model on this multitask mixture covering a wide variety of tasks.

Language Models are Unsupervised Multitask Learners

TLDR
It is demonstrated that language models begin to learn these tasks without any explicit supervision when trained on a new dataset of millions of webpages called WebText, suggesting a promising path towards building language processing systems which learn to perform tasks from their naturally occurring demonstrations.

Finetuned Language Models Are Zero-Shot Learners

TLDR
It is shown that instruction tuning —finetuning language models on a collection of datasets described via instructions—substantially improves zero-shot performance on unseen tasks and outperforms few-shot GPT-3 by a large margin on ANLI, RTE, BoolQ, AI2-ARC, OpenbookQA, and StoryCloze.

Calibrate Before Use: Improving Few-Shot Performance of Language Models

TLDR
This work first estimates the model's bias towards each answer by asking for its prediction when given the training prompt and a content-free test input such as "N/A", and then fits calibration parameters that cause the prediction for this input to be uniform across answers.

Cross-Task Generalization via Natural Language Crowdsourcing Instructions

TLDR
This work introduces NATURAL INSTRUCTIONS, a dataset of 61 distinct tasks, their human-authored instructions, and 193k task instances, and adopts generative pre-trained language models to encode task-specific instructions along with input and generate task output.

Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks

We propose an algorithm for meta-learning that is model-agnostic, in the sense that it is compatible with any model trained with gradient descent and applicable to a variety of different learning
...