MVP: Multi-task Supervised Pre-training for Natural Language Generation

  title={MVP: Multi-task Supervised Pre-training for Natural Language Generation},
  author={Tianyi Tang and Junyi Li and Wayne Xin Zhao and Ji-rong Wen},
Pre-trained language models (PLMs) have achieved notable success in natural language generation (NLG) tasks. Up to now, most of the PLMs are pre-trained in an unsupervised manner using large-scale general corpus. In the meanwhile, an increasing number of models pre-trained with less labeled data showcase superior performance compared to unsupervised models. Motivated by the success of supervised pre-training, we propose M ulti-task super V ised P re-training ( MVP ) for natural language… 



Unified Language Model Pre-training for Natural Language Understanding and Generation

A new Unified pre-trained Language Model (UniLM) that can be fine-tuned for both natural language understanding and generation tasks that compares favorably with BERT on the GLUE benchmark, and the SQuAD 2.0 and CoQA question answering tasks.

GLGE: A New General Language Generation Evaluation Benchmark

The General Language Generation Evaluation (GLGE), a new multi-task benchmark for evaluating the generalization capabilities of NLG models across eight language generation tasks, is presented and a leaderboard with strong baselines including MASS, BART, and ProphetNet is built.

ExT5: Towards Extreme Multi-Task Scaling for Transfer Learning

ExMIX (Extreme Mixture): a massive collection of 107 supervised NLP tasks across diverse domains and task-families is introduced, and a model pre-trained using a multi-task objective of self-supervised span denoising and supervised EXMIX is proposed.

Multi-Task Deep Neural Networks for Natural Language Understanding

A Multi-Task Deep Neural Network (MT-DNN) for learning representations across multiple natural language understanding (NLU) tasks that allows domain adaptation with substantially fewer in-domain labels than the pre-trained BERT representations.

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

This systematic study compares pre-training objectives, architectures, unlabeled datasets, transfer approaches, and other factors on dozens of language understanding tasks and achieves state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more.

Multi-task learning for natural language processing in the 2020s: where are we going?

Improving Language Understanding by Generative Pre-Training

The general task-agnostic model outperforms discriminatively trained models that use architectures specifically crafted for each task, improving upon the state of the art in 9 out of the 12 tasks studied.

ERNIE-GEN: An Enhanced Multi-Flow Pre-training and Fine-tuning Framework for Natural Language Generation

An enhanced multi-flow sequence to sequence pre-training and fine-tuning framework named ERNIE-GEN, which bridges the discrepancy between training and inference with an infilling generation mechanism and a noise-aware generation method to make generation closer to human writing patterns.

Alternating Language Modeling for Cross-Lingual Pre-Training

This work code-switches sentences of different languages rather than simple concatenation, hoping to capture the rich cross-lingual context of words and phrases, and shows that ALM can outperform the previous pre-training methods on three benchmarks.

Multilingual Denoising Pre-training for Neural Machine Translation

Abstract This paper demonstrates that multilingual denoising pre-training produces significant performance gains across a wide variety of machine translation (MT) tasks. We present mBART—a