• Corpus ID: 239009562

Multitask Prompted Training Enables Zero-Shot Task Generalization

  title={Multitask Prompted Training Enables Zero-Shot Task Generalization},
  author={Victor Sanh and Albert Webson and Colin Raffel and Stephen H. Bach and Lintang A. Sutawika and Zaid Alyafeai and Antoine Chaffin and Arnaud Stiegler and Teven Le Scao and Arun Raja and Manan Dey and M SAIFUL BARI and Canwen Xu and Urmish Thakker and Shanya Sharma and Eliza Szczechla and Taewoon Kim and Gunjan Chhablani and Nihal V. Nayak and Debajyoti Datta and Jonathan Chang and Mike Tian-Jian Jiang and Han Wang and Matteo Manica and Sheng Shen and Zheng Xin Yong and Harshit Pandey and Rachel Bawden and Thomas Wang and Trishala Neeraj and Jos Rozen and Abheesht Sharma and Andrea Santilli and Thibault F{\'e}vry and Jason Alan Fries and Ryan Teehan and Stella Rose Biderman and Leo Gao and T. G. Owe Bers and Thomas Wolf and Alexander M. Rush},
Large language models have recently been shown to attain reasonable zero-shot generalization on a diverse set of tasks (Brown et al., 2020). It has been hypothesized that this is a consequence of implicit multitask learning in language models' pretraining (Radford et al., 2019). Can zero-shot generalization instead be directly induced by explicit multitask learning? To test this question at scale, we develop a system for easily mapping any natural language tasks into a human-readable prompted… 
Co-training Improves Prompt-based Learning for Large Language Models
It is demonstrated that co-training (Blum & Mitchell, 1998) can improve the performance of prompt-based learning by using unlabeled data and cotraining makes it possible to improve the original prompt model and at the same time learn a smaller, downstream task-specific model.
  • 2022
Foundational Models for Continual Learning: An Empirical Study of Latent Replay
This study studies the efficacy of pre-trained vision models as a foundation for downstream continual learning (CL) scenarios, and shows how transfer, forgetting, task similarity and learning are dependent on the input data characteristics and not necessarily on the CL algorithms.
Standing on the Shoulders of Giant Frozen Language Models
condense relevant information from 100+ retrieved documents into the input sequence length of the frozen LM reader. We show that can reach and surpass leading fine tuning approaches on Natural
Reframing Human-AI Collaboration for Generating Free-Text Explanations
A pipeline that combines GPT-3 with a supervised that incorpo-rates binary acceptability judgments from humans in the loop is created and it is demonstrated that acceptability is partially correlated with various fine-grained attributes of explanations.
PaLM: Scaling Language Modeling with Pathways
A 540-billion parameter, densely activated, Transformer language model, which is called PaLM achieves breakthrough performance, outperforming the state-of-the-art on a suite of multi-step reasoning tasks, and outperforming average human performance on the recently released BIG-bench benchmark.
Recommendation as Language Processing (RLP): A Unified Pretrain, Personalized Prompt & Predict Paradigm (P5)
A flexible and unified text-to-text paradigm called P5, which unifies various recommendation tasks in a shared framework, and possesses the potential to serve as the foundation model for downstream recommendation tasks, allows easy integration with other modalities, and enables instruction-based recommendation, which will revolutionize the technical form of recommender system towards universal recommendation engine.
CheckDST: Measuring Real-World Generalization of Dialogue State Tracking Performance
It is argued that models should be assessed more holistically rather than pursuing state-of-the-art on JGA since a higher JGA does not guarantee better overall robustness, and a collection of metrics called CheckDST is designed that facilitate comparisons of DST models on comprehensive dimensions of robustness by testing well-known weaknesses with augmented test sets.
Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning
A new parameter-efficient fine-tuning method called (IA) 3 that scales activations by learned vectors, attaining stronger performance while only introducing a relatively tiny amount of new parameters.
What Language Model Architecture and Pretraining Objective Work Best for Zero-Shot Generalization?
A large-scale evaluation of modeling choices and their impact on zero-shot generalization shows that causal decoder-only models trained on an autoregressive language modeling objective exhibit the strongest zero- shot generalization after purely unsupervised pretraining, but models with non-causal visibil-ity on their input perform the best among the authors' experiments.