Pre-trained Models for Natural Language Processing: A Survey

@article{Qiu2020PretrainedMF,
  title={Pre-trained Models for Natural Language Processing: A Survey},
  author={Xipeng Qiu and Tianxiang Sun and Yige Xu and Yunfan Shao and Ning Dai and Xuanjing Huang},
  journal={ArXiv},
  year={2020},
  volume={abs/2003.08271}
}
Recently, the emergence of pre-trained models (PTMs) has brought natural language processing (NLP) to a new era. In this survey, we provide a comprehensive review of PTMs for NLP. We first briefly introduce language representation learning and its research progress. Then we systematically categorize existing PTMs based on a taxonomy with four perspectives. Next, we describe how to adapt the knowledge of PTMs to the downstream tasks. Finally, we outline some potential directions of PTMs for… Expand

Figures and Tables from this paper

Pretrained Language Models for Text Generation: A Survey
TLDR
This paper presents an overview of the major advances achieved in the topic of pretrained language models for text generation, and discusses how to adapt existing PLMs to model different input data and satisfy special properties in the generated text. Expand
A Survey on Bias in Deep NLP
TLDR
Bias is introduced in a formal way and how it has been treated in several networks, in terms of detection and correction, and a strategy to deal with bias in deep NLP is proposed. Expand
Towards Effective Utilization of Pre-trained Language Models
  • Linqing Liu
  • 2020
In the natural language processing (NLP) literature, neural networks are becoming increasingly deeper and more complex. Recent advancements in neural NLP are large pretrained language models (e.g.Expand
Advances of Transformer-Based Models for News Headline Generation
TLDR
Two pretrained Transformer-based models (mBART and BertSumAbs) are fine-tune for headline generation and achieve new state-of-the-art results on the RIA and Lenta datasets of Russian news. Expand
Knowledge Inheritance for Pre-trained Language Models
TLDR
A novel pre-training framework named “knowledge inheritance” (KI) is introduced, which combines both self-learning and teacher-guided learning to efficiently train larger PLMs and shows that KI can well support lifelong learning and knowledge transfer. Expand
Probing Multilingual Language Models for Discourse
TLDR
It is found that the XLM-RoBERTa family of models consistently show the best performance, by simultaneously being good monolingual models and degrading relatively little in a zero-shot setting. Expand
A Forgotten Strategy for Pooled Contextualized Embedding Learning
TLDR
Preliminary experiments on the WNUT-17 task show the effectiveness of the forgotten strategy, and uncover that embeddings that are diverse in terms of cosine similarity are helpful in forming an aggregated embedding. Expand
Localizing Q&A Semantic Parsers for Any Language in a Day
TLDR
The proposed Semantic Parser Localizer (SPL), a toolkit that leverages Neural Machine Translation (NMT) systems to localize a semantic parser for a new language, enables any software developer to add a newlanguage capability to any QA system for anew domain in less than 24 hours. Expand
Probing Pretrained Language Models for Lexical Semantics
TLDR
A systematic empirical analysis across six typologically diverse languages and five different lexical tasks indicates patterns and best practices that hold universally, but also point to prominent variations across languages and tasks. Expand
REPT: Bridging Language Models and Machine Reading Comprehension via Retrieval-Based Pre-training
TLDR
This work introduces two self-supervised tasks to strengthen evidence extraction during pre-training, which is further inherited by downstream MRC tasks through the consistent retrieval operation and model architecture. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 314 REFERENCES
Semi-supervised sequence tagging with bidirectional language models
TLDR
A general semi-supervised approach for adding pre- trained context embeddings from bidirectional language models to NLP systems and apply it to sequence labeling tasks, surpassing previous systems that use other forms of transfer or joint learning with additional labeled data and task specific gazetteers. Expand
Are Pre-trained Language Models Aware of Phrases? Simple but Strong Baselines for Grammar Induction
TLDR
The proposed method provides an effective way of extracting constituency trees from the pre-trained LMs without training, and reports intriguing findings in the induced trees, including the fact that pre- trained LMs outperform other approaches in correctly demarcating adverb phrases in sentences. Expand
Pre-trained language model representations for language generation
TLDR
This paper examines different strategies to integrate pre-trained representations into sequence to sequence models and applies it to neural machine translation and abstractive summarization and finds that pre- trained representations are most effective when added to the encoder network which slows inference by only 14%. Expand
BERT Rediscovers the Classical NLP Pipeline
TLDR
This work finds that the model represents the steps of the traditional NLP pipeline in an interpretable and localizable way, and that the regions responsible for each step appear in the expected sequence: POS tagging, parsing, NER, semantic roles, then coreference. Expand
StructBERT: Incorporating Language Structures into Pre-training for Deep Language Understanding
TLDR
Inspired by the linearization exploration work of Elman, BERT is extended to a new model, StructBERT, by incorporating language structures into pre-training, and the new model is adapted to different levels of language understanding required by downstream tasks. Expand
SciBERT: A Pretrained Language Model for Scientific Text
TLDR
SciBERT leverages unsupervised pretraining on a large multi-domain corpus of scientific publications to improve performance on downstream scientific NLP tasks and demonstrates statistically significant improvements over BERT. Expand
Pretrained Encyclopedia: Weakly Supervised Knowledge-Pretrained Language Model
TLDR
This work proposes a simple yet effective weakly supervised pretraining objective, which explicitly forces the model to incorporate knowledge about real-world entities, and consistently outperforms BERT on four entity-related question answering datasets. Expand
Universal Language Model Fine-tuning for Text Classification
TLDR
This work proposes Universal Language Model Fine-tuning (ULMFiT), an effective transfer learning method that can be applied to any task in NLP, and introduces techniques that are key for fine- Tuning a language model. Expand
Specializing Unsupervised Pretraining Models for Word-Level Semantic Similarity
TLDR
The experiments suggest that the standard BERT (LIBERT), specialized for the word-level semantic similarity, yields better performance than the lexically blind “vanilla” BERT on several language understanding tasks, and shows consistent gains on 3 benchmarks for lexical simplification. Expand
From static to dynamic word representations: a survey
TLDR
This survey provides a comprehensive typology of word representation models from a novel perspective that the development from static to dynamic embeddings can effectively address the polysemy problem. Expand
...
1
2
3
4
5
...