• Corpus ID: 239009562

Multitask Prompted Training Enables Zero-Shot Task Generalization

@article{Sanh2021MultitaskPT,
  title={Multitask Prompted Training Enables Zero-Shot Task Generalization},
  author={Victor Sanh and Albert Webson and Colin Raffel and Stephen H. Bach and Lintang A. Sutawika and Zaid Alyafeai and Antoine Chaffin and Arnaud Stiegler and Teven Le Scao and Arun Raja and Manan Dey and M SAIFUL BARI and Canwen Xu and Urmish Thakker and Shanya Sharma Sharma and Eliza Szczechla and Taewoon Kim and Gunjan Chhablani and Nihal V. Nayak and Debajyoti Datta and Jonathan Chang and Mike Tian-Jian Jiang and Han Wang and Matteo Manica and Sheng Shen and Zheng Xin Yong and Harshit Pandey and Rachel Bawden and Thomas Wang and Trishala Neeraj and Jos Rozen and Abheesht Sharma and Andrea Santilli and Thibault F{\'e}vry and Jason Alan Fries and Ryan Teehan and Stella Rose Biderman and Leo Gao and T. G. Owe Bers and Thomas Wolf and Alexander M. Rush},
  journal={ArXiv},
  year={2021},
  volume={abs/2110.08207}
}
Large language models have recently been shown to attain reasonable zero-shot generalization on a diverse set of tasks (Brown et al., 2020). It has been hypothesized that this is a consequence of implicit multitask learning in language models' pretraining (Radford et al., 2019). Can zero-shot generalization instead be directly induced by explicit multitask learning? To test this question at scale, we develop a system for easily mapping any natural language tasks into a human-readable, prompted… 
Reframing Human-AI Collaboration for Generating Free-Text Explanations
TLDR
A pipeline that combines GPT-3 with a supervised filter that incorporates humans-in-the-loop via binary acceptability judgments is created, and despite significant subjectivity intrinsic to judging acceptability, this approach is able to consistently filter GPT3 generated explanations deemed acceptable by humans.
CheckDST: Measuring Real-World Generalization of Dialogue State Tracking Performance
TLDR
It is argued that models should be assessed more holistically rather than pursuing state-of-the-art on JGA since a higher JGA does not guarantee better overall robustness, and a collection of metrics called CheckDST is designed that facilitate comparisons of DST models on comprehensive dimensions of robustness by testing well-known weaknesses with augmented test sets.
UnifiedSKG: Unifying and Multi-Tasking Structured Knowledge Grounding with Text-to-Text Language Models
TLDR
The UNIFIEDSKG framework is proposed, which unifies 21 SKG tasks into a text-to-text format, aiming to promote systematic SKG research, instead of being exclusive to a single task, domain, or dataset.
ZeroPrompt: Scaling Prompt-Based Pretraining to 1, 000 Tasks Improves Zero-Shot Generalization
  • Hanwei Xu, Yujun Chen, +4 authors Zhilin Yang
  • Computer Science
    ArXiv
  • 2022
TLDR
The results show that task scaling can substantially improve training efficiency by 30 times in FLOPs, and a prompting method that incorporates a genetic algorithm to automatically search for the best prompt for unseen tasks, along with a few other improvements.
A General Language Assistant as a Laboratory for Alignment
TLDR
A ‘preference model pre-training’ stage of training is studied, with the goal of improving sample efficiency when finetuning on human preferences and investigating scaling trends for several training objectives relevant to alignment.
An Explanation of In-context Learning as Implicit Bayesian Inference
TLDR
This paper proves that in-context learning occurs implicitly via Bayesian inference of the latent concept when the pretraining distribution is a mixture of HMMs, and generates a family of small-scale synthetic datasets (GINC) where Transformer and LSTM language models both exhibit in- context learning.
CRASS: A Novel Data Set and Benchmark to Test Counterfactual Reasoning of Large Language Models
TLDR
The CRASS data set design and benchmark as well as the accompanying API that supports scoring against a crowd-validated human baseline are introduced and it is shown that it poses a valid challenge for these models and opens up considerable room for their improvement.
Cross-Task Generalization via Natural Language Crowdsourcing Instructions
TLDR
This work introduces NATURALINSTRUCTIONS, a dataset of 61 distinct tasks, their human-authored instructions and 193k task instances, and adopts generative pre-trained language models to encode task-specific instructions along with input and generate task output.
Domain Adaptation with Pre-trained Transformers for Query Focused Abstractive Text Summarization
TLDR
This paper applies a variety of techniques using pre-trained transformer-based summarization models including transfer learning, weakly supervised learning, and distant supervision to generate abstractive summaries for the Query Focused Text Summarization task.
Few-shot Learning with Multilingual Language Models
TLDR
A detailed analysis of where the model succeeds and fails is presented, showing in particular that it enables cross-lingual in-context learning on some tasks, while there is still room for improvement on surface form robustness and adaptation to tasks that do not have a natural cloze form.
...
1
2
...