SPT: Semi-Parametric Prompt Tuning for Multitask Prompted Learning
@article{Bari2022SPTSP, title={SPT: Semi-Parametric Prompt Tuning for Multitask Prompted Learning}, author={M Saiful Bari and Aston Zhang and Shuai Zheng and Xingjian Shi and Yi Zhu and Shafiq R. Joty and Mu Li}, journal={ArXiv}, year={2022}, volume={abs/2212.10929} }
Pre-trained large language models can efficiently interpolate human-written prompts in a natural way. Multitask prompted learning can help generalization through a diverse set of tasks at once, thus enhancing the potential for more effective downstream fine-tuning. To perform efficient multitask-inference in the same batch, parameter-efficient fine-tuning methods such as prompt tuning have been proposed. However, the existing prompt tuning methods may lack generalization. We propose SPT, a semi…
Figures and Tables from this paper
One Citation
Parameter-Efficient Fine-Tuning Design Spaces
- Computer ScienceArXiv
- 2023
This work presents a parameter-efficient fine-tuning design paradigm and discovers design patterns that are applicable to different experimental settings and shows experimentally that these methods consistently and significantly outperform investigated parameter- efficient fine- Tuning strategies across different backbone models and different tasks in natural language processing.
References
SHOWING 1-10 OF 11 REFERENCES
When to Use Multi-Task Learning vs Intermediate Fine-Tuning for Pre-Trained Encoder Transfer Learning
- Computer ScienceACL
- 2022
There is a simple heuristic for when to use pairwise MTL is better than STILTs when the target task has fewer instances than the supporting task and vice versa, and this holds true in more than 92% of applicable cases on the GLUE dataset and validated with experiments varying dataset size.
LoRA: Low-Rank Adaptation of Large Language Models
- Computer ScienceICLR
- 2022
Low-Rank Adaptation, or LoRA, is proposed, which freezes the pre-trained model weights and injects trainable rank decomposition matrices into each layer of the Transformer architecture, greatly reducing the number of trainable parameters for downstream tasks.
On the Effectiveness of Adapter-based Tuning for Pretrained Language Model Adaptation
- Computer ScienceACL
- 2021
It is demonstrated that 1) adapter-based tuning outperforms fine-tuning on low-resource and cross-lingual tasks; 2) it is more robust to overfitting and less sensitive to changes in learning rates.
BitFit: Simple Parameter-efficient Fine-tuning for Transformer-based Masked Language-models
- Computer ScienceACL
- 2022
We introduce BitFit, a sparse-finetuning method where only the bias-terms of the model (or a subset of them) are being modified. We show that with small-to-medium training data, applying BitFit on…
Looking Beyond the Surface: A Challenge Set for Reading Comprehension over Multiple Sentences
- Computer ScienceNAACL
- 2018
The dataset is the first to study multi-sentence inference at scale, with an open-ended set of question types that requires reasoning skills, and finds human solvers to achieve an F1-score of 88.1%.
WINOGRANDE: An Adversarial Winograd Schema Challenge at Scale
- Computer ScienceAAAI
- 2020
This work introduces WinoGrande, a large-scale dataset of 44k problems, inspired by the original WSC design, but adjusted to improve both the scale and the hardness of the dataset, and establishes new state-of-the-art results on five related benchmarks.
HellaSwag: Can a Machine Really Finish Your Sentence?
- Computer ScienceACL
- 2019
The construction of HellaSwag, a new challenge dataset, and its resulting difficulty, sheds light on the inner workings of deep pretrained models, and suggests a new path forward for NLP research, in which benchmarks co-evolve with the evolving state-of-the-art in an adversarial way, so as to present ever-harder challenges.
Automatic Chain of Thought Prompting in Large Language Models
- Computer Science, PhysicsArXiv
- 2022
An automatic CoT prompting method that samples questions with diversity and generates reasoning chains to construct demonstrations and consistently matches or exceeds the performance of the CoT paradigm that requires manual designs of demonstrations.
Chain of Thought Prompting Elicits Reasoning in Large Language Models
- Computer ScienceArXiv
- 2022
Experiments on three large language models show that chain-of-thought prompting improves performance on a range of arithmetic, commonsense, and symbolic reasoning tasks.
Beyond Fully-Connected Layers with Quaternions: Parameterization of Hypercomplex Multiplications with 1/n Parameters
- Computer ScienceICLR
- 2021
This work proposes parameterizing hypercomplex multiplications, allowing models to learn multiplication rules from data regardless of whether such rules are predefined, providing more architectural flexibility using arbitrarily 1/n learnable parameters compared with the fully-connected layer counterpart.