SPT: Semi-Parametric Prompt Tuning for Multitask Prompted Learning

  title={SPT: Semi-Parametric Prompt Tuning for Multitask Prompted Learning},
  author={M Saiful Bari and Aston Zhang and Shuai Zheng and Xingjian Shi and Yi Zhu and Shafiq R. Joty and Mu Li},
Pre-trained large language models can efficiently interpolate human-written prompts in a natural way. Multitask prompted learning can help generalization through a diverse set of tasks at once, thus enhancing the potential for more effective downstream fine-tuning. To perform efficient multitask-inference in the same batch, parameter-efficient fine-tuning methods such as prompt tuning have been proposed. However, the existing prompt tuning methods may lack generalization. We propose SPT, a semi… 
1 Citations

Figures and Tables from this paper

Parameter-Efficient Fine-Tuning Design Spaces

This work presents a parameter-efficient fine-tuning design paradigm and discovers design patterns that are applicable to different experimental settings and shows experimentally that these methods consistently and significantly outperform investigated parameter- efficient fine- Tuning strategies across different backbone models and different tasks in natural language processing.



When to Use Multi-Task Learning vs Intermediate Fine-Tuning for Pre-Trained Encoder Transfer Learning

There is a simple heuristic for when to use pairwise MTL is better than STILTs when the target task has fewer instances than the supporting task and vice versa, and this holds true in more than 92% of applicable cases on the GLUE dataset and validated with experiments varying dataset size.

LoRA: Low-Rank Adaptation of Large Language Models

Low-Rank Adaptation, or LoRA, is proposed, which freezes the pre-trained model weights and injects trainable rank decomposition matrices into each layer of the Transformer architecture, greatly reducing the number of trainable parameters for downstream tasks.

On the Effectiveness of Adapter-based Tuning for Pretrained Language Model Adaptation

It is demonstrated that 1) adapter-based tuning outperforms fine-tuning on low-resource and cross-lingual tasks; 2) it is more robust to overfitting and less sensitive to changes in learning rates.

BitFit: Simple Parameter-efficient Fine-tuning for Transformer-based Masked Language-models

We introduce BitFit, a sparse-finetuning method where only the bias-terms of the model (or a subset of them) are being modified. We show that with small-to-medium training data, applying BitFit on

Looking Beyond the Surface: A Challenge Set for Reading Comprehension over Multiple Sentences

The dataset is the first to study multi-sentence inference at scale, with an open-ended set of question types that requires reasoning skills, and finds human solvers to achieve an F1-score of 88.1%.

WINOGRANDE: An Adversarial Winograd Schema Challenge at Scale

This work introduces WinoGrande, a large-scale dataset of 44k problems, inspired by the original WSC design, but adjusted to improve both the scale and the hardness of the dataset, and establishes new state-of-the-art results on five related benchmarks.

HellaSwag: Can a Machine Really Finish Your Sentence?

The construction of HellaSwag, a new challenge dataset, and its resulting difficulty, sheds light on the inner workings of deep pretrained models, and suggests a new path forward for NLP research, in which benchmarks co-evolve with the evolving state-of-the-art in an adversarial way, so as to present ever-harder challenges.

Automatic Chain of Thought Prompting in Large Language Models

An automatic CoT prompting method that samples questions with diversity and generates reasoning chains to construct demonstrations and consistently matches or exceeds the performance of the CoT paradigm that requires manual designs of demonstrations.

Chain of Thought Prompting Elicits Reasoning in Large Language Models

Experiments on three large language models show that chain-of-thought prompting improves performance on a range of arithmetic, commonsense, and symbolic reasoning tasks.

Beyond Fully-Connected Layers with Quaternions: Parameterization of Hypercomplex Multiplications with 1/n Parameters

This work proposes parameterizing hypercomplex multiplications, allowing models to learn multiplication rules from data regardless of whether such rules are predefined, providing more architectural flexibility using arbitrarily 1/n learnable parameters compared with the fully-connected layer counterpart.