Data-Efficient Finetuning Using Cross-Task Nearest Neighbors

  title={Data-Efficient Finetuning Using Cross-Task Nearest Neighbors},
  author={Hamish Ivison and Noah A. Smith and Hannaneh Hajishirzi and Pradeep Dasigi},
Obtaining labeled data to train a model for a task of interest is often expensive. Prior work shows training models on multitask data augmented with task descriptions (prompts) effectively transfers knowledge to new tasks. Towards efficiently building task-specific models, we assume access to a small number (32-1000) of unlabeled target-task examples and use those to retrieve the most similar labeled examples from a large pool of multitask data augmented with prompts. Compared to the current… 

Figures and Tables from this paper

TaskWeb: Selecting Better Source Tasks for Multi-task NLP

This work provides TaskWeb, a large-scale benchmark of pairwise task transfers for 22 NLP tasks using three different model types, sizes, and adaptation methods, and designs a new method TaskShop, based on the analysis of TaskWeb.

Improving Few-Shot Generalization by Exploring and Exploiting Auxiliary Data

This work relates FLAD to the explore-exploit dilemma that is central to the multi-armed bandit setting and derive algorithms whose computational complexity is independent of the number of auxiliary datasets, allowing them to scale to 100x more auxiliary datasets than prior methods.

Maybe Only 0.5% Data is Needed: A Preliminary Exploration of Low Training Data Instruction Tuning

A preliminary exploration into reducing the data used in LLM instruction tuning and identifies several observations regarding task specialization for LLM training, such as the optimization of performance for a specific task, the number of instruction types required for instruction tuning, and the amount of data required for task-specific models.

What to Pre-Train on? Efficient Intermediate Task Selection

This work provides a comprehensive comparison of different methods for efficiently identifying beneficial tasks for intermediate transfer learning, focusing on parameter and computationally efficient adapter settings, highlight different data-availability scenarios, and provide expense estimates for each method.

Language Models are Few-Shot Learners

GPT-3 achieves strong performance on many NLP datasets, including translation, question-answering, and cloze tasks, as well as several tasks that require on-the-fly reasoning or domain adaptation, such as unscrambling words, using a novel word in a sentence, or performing 3-digit arithmetic.

Efficient Multi-Task Auxiliary Learning: Selecting Auxiliary Data by Feature Similarity

A time-efficient sampling method to select the data that is most relevant to the primary task and can surpass fully-trained MT-DNN on RTE, MRPC, STS-B, using only 50%, 66%, and 1% of data, respectively.

Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning

This paper rigorously compares few-shot ICL and PEFT and demonstrates that the latter offers better accuracy as well as dramatically lower computational costs, and introduces a new PEFT method called (IA)$^3$ that scales activations by learned vectors, attaining stronger performance while only introducing a relatively tiny amount of new parameters.

Multitask Prompted Training Enables Zero-Shot Task Generalization

A system for easily mapping any natural language tasks into a human-readable prompted form and fine-tune a pretrained encoder-decoder model on this multitask mixture covering a wide variety of tasks.

ExT5: Towards Extreme Multi-Task Scaling for Transfer Learning

This paper introduces ExMix (Extreme Mixture): a massive collection of 107 supervised NLP tasks across diverse domains and task-families, and proposes ExT5: a model pre-trained using a multi-task objective of self-supervised span denoising and supervised ExMix.

Nearest Neighbor Zero-Shot Inference

KNN-Prompt is introduced, a simple and effective kNN-LM with automatically expanded fuzzy verbalizers that is effective for domain adaptation with no further training, and gains increase with the size of the retrieval model.

Unsupervised Cross-Task Generalization via Retrieval Augmentation

This paper proposes a retrieval-augmentation method named ReCross that takes a few unlabelled examples as queries to retrieve a small subset of upstream data and uses them to update the multi-task model for better generalization.

Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks

A general-purpose fine-tuning recipe for retrieval-augmented generation (RAG) -- models which combine pre-trained parametric and non-parametric memory for language generation, and finds that RAG models generate more specific, diverse and factual language than a state-of-the-art parametric-only seq2seq baseline.

Parameter-Efficient Transfer Learning for NLP

To demonstrate adapter's effectiveness, the recently proposed BERT Transformer model is transferred to 26 diverse text classification tasks, including the GLUE benchmark, and adapter attain near state-of-the-art performance, whilst adding only a few parameters per task.