Pre-trained Models for Natural Language Processing: A Survey

  title={Pre-trained Models for Natural Language Processing: A Survey},
  author={Xipeng Qiu and Tianxiang Sun and Yige Xu and Yunfan Shao and Ning Dai and Xuanjing Huang},
Recently, the emergence of pre-trained models (PTMs) has brought natural language processing (NLP) to a new era. In this survey, we provide a comprehensive review of PTMs for NLP. We first briefly introduce language representation learning and its research progress. Then we systematically categorize existing PTMs based on a taxonomy with four perspectives. Next, we describe how to adapt the knowledge of PTMs to the downstream tasks. Finally, we outline some potential directions of PTMs for… 

Pretrained Language Models for Text Generation: A Survey

This paper presents an overview of the major advances achieved in the topic of pretrained language models for text generation and discusses how to adapt existing PLMs to model different input data and satisfy special properties in the generated text.

A Survey on Bias in Deep NLP

Bias is introduced in a formal way and how it has been treated in several networks, in terms of detection and correction, and a strategy to deal with bias in deep NLP is proposed.

Towards Effective Utilization of Pre-trained Language Models

This thesis proposes MKD, a Multi-Task Knowledge Distillation Approach, where a large pretrained model serves as teacher and transfers its knowledge to a small student model, which distills the student model from different tasks jointly, so that the distilled model learns a more universal language representation by leveraging cross-task data.

A Survey on Recent Approaches for Natural Language Processing in Low-Resource Scenarios

A structured overview of methods that enable learning when training data is sparse including mechanisms to create additional labeled data like data augmentation and distant supervision as well as transfer learning settings that reduce the need for target supervision are given.

Advances of Transformer-Based Models for News Headline Generation

Two pretrained Transformer-based models (mBART and BertSumAbs) are fine-tune for headline generation and achieve new state-of-the-art results on the RIA and Lenta datasets of Russian news.

Knowledge Inheritance for Pre-trained Language Models

A pre-training framework named “knowledge inheritance” (KI) is introduced and how could knowledge distillation serve as auxiliary supervision during pre- training to efficiently learn larger PLMs is explored, demonstrating the superiority of KI in training efficiency.

Probing Multilingual Language Models for Discourse

It is found that the XLM-RoBERTa family of models consistently show the best performance, by simultaneously being good monolingual models and degrading relatively little in a zero-shot setting.

PTR: Prompt Tuning with Rules for Text Classification

This work proposes prompt tuning with rules (PTR) for many-class text classification, and applies logic rules to construct prompts with several sub-prompts, which is able to encode prior knowledge of each class into prompt tuning.

A Forgotten Strategy for Pooled Contextualized Embedding Learning

Preliminary experiments on the WNUT-17 task show the effectiveness of the forgotten strategy, and uncover that embeddings that are diverse in terms of cosine similarity are helpful in forming an aggregated embedding.

Localizing Q&A Semantic Parsers for Any Language in a Day

The proposed Semantic Parser Localizer (SPL), a toolkit that leverages Neural Machine Translation (NMT) systems to localize a semantic parser for a new language, enables any software developer to add a newlanguage capability to any QA system for anew domain in less than 24 hours.



Semi-supervised sequence tagging with bidirectional language models

A general semi-supervised approach for adding pretrained context embeddings from bidirectional language models to NLP systems and apply it to sequence labeling tasks, surpassing previous systems that use other forms of transfer or joint learning with additional labeled data and task specific gazetteers.

Are Pre-trained Language Models Aware of Phrases? Simple but Strong Baselines for Grammar Induction

The proposed method provides an effective way of extracting constituency trees from the pre-trained LMs without training, and reports intriguing findings in the induced trees, including the fact that pre- trained LMs outperform other approaches in correctly demarcating adverb phrases in sentences.

Pre-trained language model representations for language generation

This paper examines different strategies to integrate pre-trained representations into sequence to sequence models and applies it to neural machine translation and abstractive summarization and finds that pre- trained representations are most effective when added to the encoder network which slows inference by only 14%.

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

This systematic study compares pre-training objectives, architectures, unlabeled datasets, transfer approaches, and other factors on dozens of language understanding tasks and achieves state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more.

BERT Rediscovers the Classical NLP Pipeline

This work finds that the model represents the steps of the traditional NLP pipeline in an interpretable and localizable way, and that the regions responsible for each step appear in the expected sequence: POS tagging, parsing, NER, semantic roles, then coreference.

StructBERT: Incorporating Language Structures into Pre-training for Deep Language Understanding

Inspired by the linearization exploration work of Elman, BERT is extended to a new model, StructBERT, by incorporating language structures into pre-training, and the new model is adapted to different levels of language understanding required by downstream tasks.

SciBERT: A Pretrained Language Model for Scientific Text

SciBERT leverages unsupervised pretraining on a large multi-domain corpus of scientific publications to improve performance on downstream scientific NLP tasks and demonstrates statistically significant improvements over BERT.

Universal Language Model Fine-tuning for Text Classification

This work proposes Universal Language Model Fine-tuning (ULMFiT), an effective transfer learning method that can be applied to any task in NLP, and introduces techniques that are key for fine- Tuning a language model.

Pretrained Encyclopedia: Weakly Supervised Knowledge-Pretrained Language Model

This work proposes a simple yet effective weakly supervised pretraining objective, which explicitly forces the model to incorporate knowledge about real-world entities, and consistently outperforms BERT on four entity-related question answering datasets.

Natural Language Processing (Almost) from Scratch

We propose a unified neural network architecture and learning algorithm that can be applied to various natural language processing tasks including part-of-speech tagging, chunking, named entity