• Corpus ID: 147704286

Unified Language Model Pre-training for Natural Language Understanding and Generation

@article{Dong2019UnifiedLM,
  title={Unified Language Model Pre-training for Natural Language Understanding and Generation},
  author={Li Dong and Nan Yang and Wenhui Wang and Furu Wei and Xiaodong Liu and Yu Wang and Jianfeng Gao and M. Zhou and Hsiao-Wuen Hon},
  journal={ArXiv},
  year={2019},
  volume={abs/1905.03197}
}
This paper presents a new Unified pre-trained Language Model (UniLM) that can be fine-tuned for both natural language understanding and generation tasks. [] Key Method We can fine-tune UniLM as a unidirectional decoder, a bidirectional encoder, or a sequence-to-sequence model to support various downstream natural language understanding and generation tasks. UniLM compares favorably with BERT on the GLUE benchmark, and the SQuAD 2.0 and CoQA question answering tasks. Moreover, our model achieves new state-of…

Figures and Tables from this paper

Cross-Lingual Natural Language Generation via Pre-Training

Experimental results on question generation and abstractive summarization show that the model outperforms the machine-translation-based pipeline methods for zero-shot cross-lingual generation and improves NLG performance of low-resource languages by leveraging rich-resource language data.

Unified BERT for Few-shot Natural Language Understanding

This paper proposes UBERT, a unified bidirectional language understanding model based on BERT framework, which can universally model the training objects of different NLU tasks through a biaffine network, conducive to enhancing the ability to capture common semantic understanding.

Multi-Task Deep Neural Networks for Natural Language Understanding

A Multi-Task Deep Neural Network (MT-DNN) for learning representations across multiple natural language understanding (NLU) tasks that allows domain adaptation with substantially fewer in-domain labels than the pre-trained BERT representations.

MVP: Multi-task Supervised Pre-training for Natural Language Generation

This work collects a large-scale natural language generation corpus, MVPCorpus, from 77 datasets over 11 diverse NLG tasks, and unifies these examples into a general text-to-text format to pre-train the text generation model MVP in a supervised manner.

CopyBERT: A Unified Approach to Question Generation with Self-Attention

It is shown that the information from selfattentions of BERT are useful for language modeling of questions conditioned on paragraph and answer phrases and to control the attention span, a semi-diagonal mask is used and a shared model is utilized for encoding and decoding, unlike sequence-to-sequence.

Pre-training Text-to-Text Transformers for Concept-centric Common Sense

It is shown that while only incrementally pre-trained on a relatively small corpus for a few steps, CALM outperforms baseline methods by a consistent margin and even comparable with some larger PTLMs, which suggests that CALM can serve as a general, plug-and-play method for improving the commonsense reasoning ability of a PTLM.

CONCEPT-CENTRIC COMMON SENSE

This paper proposes both generative and contrastive objectives for learning common sense from the text, and uses them as intermediate self-supervised learning tasks for incrementally pre-training PTLMs (before task-specific fine-tuning on downstream datasets).

Unified Vision-Language Pre-Training for Image Captioning and VQA

VLP is the first reported model that achieves state-of-the-art results on both vision-language generation and understanding tasks, as disparate as image captioning and visual question answering, across three challenging benchmark datasets: COCO Captions, Flickr30k Captions and VQA 2.0.

Multi-Lingual Question Generation with Language Agnostic Language Model

A language-agnostic language model is developed, which learns the shared representation from several languages in a single architecture, and an adversarial training objective is proposed to encourage the model to learn both language-specific and language-independent information.

A Survey of Knowledge-Enhanced Pre-trained Language Models

A comprehensive review of Knowledge-Enhanced Pre-trained Language Models (KE-PLMs) is presented to provide a clear insight into this thriving industry and introduces appropriate taxonomies respectively for Natural Language Understanding (NLU) and Natural Language Generation (NLG) to highlight the focus of these two kinds of tasks.
...

References

SHOWING 1-10 OF 58 REFERENCES

Cross-Lingual Natural Language Generation via Pre-Training

Experimental results on question generation and abstractive summarization show that the model outperforms the machine-translation-based pipeline methods for zero-shot cross-lingual generation and improves NLG performance of low-resource languages by leveraging rich-resource language data.

Pre-trained language model representations for language generation

This paper examines different strategies to integrate pre-trained representations into sequence to sequence models and applies it to neural machine translation and abstractive summarization and finds that pre- trained representations are most effective when added to the encoder network which slows inference by only 14%.

Multi-Task Deep Neural Networks for Natural Language Understanding

A Multi-Task Deep Neural Network (MT-DNN) for learning representations across multiple natural language understanding (NLU) tasks that allows domain adaptation with substantially fewer in-domain labels than the pre-trained BERT representations.

Language Models are Unsupervised Multitask Learners

It is demonstrated that language models begin to learn these tasks without any explicit supervision when trained on a new dataset of millions of webpages called WebText, suggesting a promising path towards building language processing systems which learn to perform tasks from their naturally occurring demonstrations.

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.

MASS: Masked Sequence to Sequence Pre-training for Language Generation

This work proposes MAsked Sequence to Sequence pre-training (MASS) for the encoder-decoder based language generation tasks, which achieves the state-of-the-art accuracy on the unsupervised English-French translation, even beating the early attention-based supervised model.

Improving Language Understanding by Generative Pre-Training

The general task-agnostic model outperforms discriminatively trained models that use architectures specifically crafted for each task, improving upon the state of the art in 9 out of the 12 tasks studied.

GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding

A benchmark of nine diverse NLU tasks, an auxiliary dataset for probing models for understanding of specific linguistic phenomena, and an online platform for evaluating and comparing models, which favors models that can represent linguistic knowledge in a way that facilitates sample-efficient learning and effective knowledge-transfer across tasks.

Text Summarization with Pretrained Encoders

This paper introduces a novel document-level encoder based on BERT which is able to express the semantics of a document and obtain representations for its sentences and proposes a new fine-tuning schedule which adopts different optimizers for the encoder and the decoder as a means of alleviating the mismatch between the two.

Attention is All you Need

A new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely is proposed, which generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.
...