SAS: Self-Augmentation Strategy for Language Model Pre-training

  title={SAS: Self-Augmentation Strategy for Language Model Pre-training},
  author={Yifei Xu and Jingqiao Zhang and Ru He and Liangzhu Ge and Chao Yang and Cheng Yang and Ying Wu},
The core of self-supervised learning for pre-training language models includes pre-training task design as well as appropriate data augmentation. Most data augmentations in language model pre-training are context-independent. A seminal contextualized augmentation was recently proposed in ELECTRA and achieved state-of-the-art performance by introducing an auxiliary generation network (generator) to produce contextualized data augmentation for the training of a main discrimination network… 

Figures and Tables from this paper



MC-BERT: Efficient Language Pre-Training via a Meta Controller

Results over GLUE natural language understanding benchmark demonstrate that the proposed MC-BERT method is both efficient and effective: it outperforms baselines on GLUE semantic tasks given the same computational budget.

SCRIPT: Self-Critic PreTraining of Transformers

Self-CRItic Pretraining Transformers (SCRIPT) is introduced for representation learning of text and improved sample-efficiency in pretraining and enhanced representations evidenced by improved downstream task performance on GLUE and SQuAD over strong baselines.

COCO-LM: Correcting and Contrasting Text Sequences for Language Model Pretraining

A self-supervised learning framework that pretrains Language Models by COrrecting and COntrasting corrupted text sequences that outperforms recent state-of-the-art pretrained models in accuracy, but also improves pretraining efficiency.

ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators

The contextual representations learned by the proposed replaced token detection pre-training task substantially outperform the ones learned by methods such as BERT and XLNet given the same model size, data, and compute.

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.

StructBERT: Incorporating Language Structures into Pre-training for Deep Language Understanding

Inspired by the linearization exploration work of Elman, BERT is extended to a new model, StructBERT, by incorporating language structures into pre-training, and the new model is adapted to different levels of language understanding required by downstream tasks.

Unified Language Model Pre-training for Natural Language Understanding and Generation

A new Unified pre-trained Language Model (UniLM) that can be fine-tuned for both natural language understanding and generation tasks that compares favorably with BERT on the GLUE benchmark, and the SQuAD 2.0 and CoQA question answering tasks.

Transformers : State-ofthe-art Natural Language Processing

Transformers is presented, a library for state-of-the-art NLP, making these developments available to the community by gathering state of theart general-purpose pretrained models under a unified API together with an ecosystem of libraries, examples, tutorials and scripts targeting many downstream NLP tasks.

XLNet: Generalized Autoregressive Pretraining for Language Understanding

XLNet is proposed, a generalized autoregressive pretraining method that enables learning bidirectional contexts by maximizing the expected likelihood over all permutations of the factorization order and overcomes the limitations of BERT thanks to its autore progressive formulation.

CLEAR: Contrastive Learning for Sentence Representation

This paper proposes Contrastive LEArning for sentence Representation (CLEAR), which employs multiple sentence-level augmentation strategies in order to learn a noise-invariant sentence representation and investigates the key reasons that make contrastive learning effective through numerous experiments.