MetaICL: Learning to Learn In Context

  title={MetaICL: Learning to Learn In Context},
  author={Sewon Min and Mike Lewis and Luke Zettlemoyer and Hannaneh Hajishirzi},
We introduce MetaICL (Meta-training for In-Context Learning), a new meta-training framework for few-shot learning where a pretrained language model is tuned to do in-context learning on a large set of training tasks. This meta-training enables the model to more effectively learn a new task in context at test time, by simply conditioning on a few training examples with no parameter updates or task-specific templates. We experiment on a large, diverse collection of tasks consisting of 142 NLP… 

Boosting Natural Language Generation from Instructions with Meta-Learning

This paper proposes to adapt meta-learning to MTIL in three directions: 1) Model Agnostic Meta Learning (MAML), 2) Hyper-Network based adaptation to generate task specific parameters conditioned on instructions, and 3) an approach combining HNet and MAML.

Preserving In-Context Learning ability in Large Language Model Fine-tuning

ProMoT is proposed, a simple yet effective two-stage fine-tuning framework that preserves in-context abilities of the pretrained model and shows remarkable generalization ability on tasks that have different formats, e.g. natural language inference and English-French translation.

On the Effect of Pretraining Corpora on In-context Learning by a Large-scale Language Model

This in-depth investigation of the effects of the source and size of the pretraining corpus on in-context learning in HyperCLOVA, a Korean-centric GPT-3 model finds that in- context learning performance heavily depends on the corpus domain source.

Guess the Instruction! Flipped Learning Makes Language Models Stronger Zero-Shot Learners

FLIPPED gives particularly large improvements on unseen labels, outperforming T0-11B by up to +20% average F1 score, indicating that the strong task generalization of FLIPPED comes from improved generalization to novel labels.

Rethinking the Role of Demonstrations: What Makes In-Context Learning Work?

This paper shows that ground truth demonstrations are in fact not required and that other aspects of the demonstrations are the key drivers of end task performance, including the fact that they provide a few examples of the label space, the distribution of the input text, and the overall format of the sequence.

Improving In-Context Few-Shot Learning via Self-Supervised Training

This paper proposes to use self-supervision in an intermediate training stage between pretraining and downstream few-shot usage with the goal to teach the model to perform in-context few shot learning.

Finetuned Language Models Are Zero-Shot Learners

It is shown that instruction tuning —finetuning language models on a collection of datasets described via instructions—substantially improves zero-shot performance on unseen tasks and outperforms few-shot GPT-3 by a large margin on ANLI, RTE, BoolQ, AI2-ARC, OpenbookQA, and StoryCloze.

What Can Transformers Learn In-Context? A Case Study of Simple Function Classes

It is shown empirically that standard Transformers can be trained from scratch to perform in-context learning of linear functions—that is, the trained model is able to learn unseen linear functions from in- context examples with performance comparable to the optimal least squares estimator.

Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning

A new parameter-efficient fine-tuning method called (IA) 3 that scales activations by learned vectors, attaining stronger performance while only introducing a relatively tiny amount of new parameters.

Few-shot Learning with Multilingual Language Models

This work trains multilingual generative language models on a corpus covering a diverse set of languages, and conducts an in-depth analysis of different multilingual prompting approaches, showing in particular that strong in-context few-shot learning performance across languages can be achieved via cross-lingual transfer through both templates and demonstration examples.



CrossFit: A Few-shot Learning Challenge for Cross-task Generalization in NLP

This paper presents the NLP Few-shot Gym, a repository of 160 diverse few-shot NLP tasks created from open-access NLP datasets and converted to a unified text-to-text format, and reveals that the few- shot learning ability on unseen tasks can be improved via an upstream learning stage using a set of seen tasks.

Adapting Language Models for Zero-shot Learning by Meta-tuning on Dataset and Prompt Collections

Meta-tuning is proposed, which directly optimizes the zero-shot learning objective by finetuning pre-trained language models on a collection of datasets by aggregating 43 existing datasets and annotating 441 label descriptions in a question-answering (QA) format.

Muppet: Massive Multi-task Representations with Pre-Finetuning

It is shown that pre-finetuning consistently improves performance for pretrained discriminators and generation models on a wide range of tasks while also significantly improving sample efficiency during fine-tuning, and that large-scale multi-tasking is crucial.

Multitask Prompted Training Enables Zero-Shot Task Generalization

A system for easily mapping any natural language tasks into a human-readable prompted form and fine-tune a pretrained encoder-decoder model on this multitask mixture covering a wide variety of tasks.

Language Models are Unsupervised Multitask Learners

It is demonstrated that language models begin to learn these tasks without any explicit supervision when trained on a new dataset of millions of webpages called WebText, suggesting a promising path towards building language processing systems which learn to perform tasks from their naturally occurring demonstrations.

Finetuned Language Models Are Zero-Shot Learners

It is shown that instruction tuning —finetuning language models on a collection of datasets described via instructions—substantially improves zero-shot performance on unseen tasks and outperforms few-shot GPT-3 by a large margin on ANLI, RTE, BoolQ, AI2-ARC, OpenbookQA, and StoryCloze.

Calibrate Before Use: Improving Few-Shot Performance of Language Models

This work first estimates the model's bias towards each answer by asking for its prediction when given the training prompt and a content-free test input such as "N/A", and then fits calibration parameters that cause the prediction for this input to be uniform across answers.

Cross-Task Generalization via Natural Language Crowdsourcing Instructions

This work introduces NATURAL INSTRUCTIONS, a dataset of 61 distinct tasks, their human-authored instructions, and 193k task instances, and adopts generative pre-trained language models to encode task-specific instructions along with input and generate task output.

Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks

We propose an algorithm for meta-learning that is model-agnostic, in the sense that it is compatible with any model trained with gradient descent and applicable to a variety of different learning

GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding

A benchmark of nine diverse NLU tasks, an auxiliary dataset for probing models for understanding of specific linguistic phenomena, and an online platform for evaluating and comparing models, which favors models that can represent linguistic knowledge in a way that facilitates sample-efficient learning and effective knowledge-transfer across tasks.