• Corpus ID: 199552081

StructBERT: Incorporating Language Structures into Pre-training for Deep Language Understanding

@article{Wang2020StructBERTIL,
  title={StructBERT: Incorporating Language Structures into Pre-training for Deep Language Understanding},
  author={Wei Wang and Bin Bi and Ming Yan and Chen Wu and Zuyi Bao and Liwei Peng and Luo Si},
  journal={ArXiv},
  year={2020},
  volume={abs/1908.04577}
}
Recently, the pre-trained language model, BERT (and its robustly optimized version RoBERTa), has attracted a lot of attention in natural language understanding (NLU), and achieved state-of-the-art accuracy in various NLU tasks, such as sentiment classification, natural language inference, semantic textual similarity and question answering. Inspired by the linearization exploration work of Elman [8], we extend BERT to a new model, StructBERT, by incorporating language structures into pre… 

Figures and Tables from this paper

SLM: Learning a Discourse Language Representation with Sentence Unshuffling
TLDR
Sentence-level Language Modeling is introduced, a new pre-training objective for learning a discourse language representation in a fully self-supervised manner by shuffling the sequence of input sentences and training a hierarchical transformer model to reconstruct the original ordering.
On Losses for Modern Language Models
TLDR
It is shown that NSP is detrimental to training due to its context splitting and shallow semantic signal, and it is demonstrated that using multiple tasks in a multi-task pre-training framework provides better results than using any single auxiliary task.
StructuralLM: Structural Pre-training for Form Understanding
TLDR
This paper proposes a new pre-training approach, StructuralLM, to jointly leverage cell and layout information from scanned documents, with two new designs to make the most of the interactions of cell and layouts information.
SegaBERT: Pre-training of Segment-aware BERT for Language Understanding
TLDR
A segment-aware BERT is proposed, by replacing the token position embedding of Transformer with a combination of paragraph index, sentence index, and token index embeddings, and Experimental results show that the pre-trained model can outperform the original BERT model on various NLP tasks.
OctaNLP: A Benchmark for Evaluating Multitask Generalization of Transformer-Based Pre-trained Language Models
TLDR
This paper overviews NLP evaluation metrics, multitask benchmarks, and the recent transformer-based language models, and proposes the octaNLP benchmark for comparing the generalization capabilities of the transformer- based pre-trained language models on multiple downstream NLP tasks simultaneously.
AMMUS : A Survey of Transformer-based Pretrained Models in Natural Language Processing
TLDR
This comprehensive survey paper explains various core concepts like pretraining, Pretraining methods, pretraining tasks, embeddings and downstream adaptation methods, presents a new taxonomy of T-PTLMs and gives brief overview of various benchmarks including both intrinsic and extrinsic.
HerBERT: Efficiently Pretrained Transformer-based Language Model for Polish
TLDR
This paper designs and thoroughly evaluates a pretraining procedure of transferring knowledge from multilingual to monolingual BERT-based models and achieves state-of-the-art results on multiple downstream tasks.
Linking-Enhanced Pre-Training for Table Semantic Parsing
TLDR
Two novel pre-training objectives are designed to impose the desired inductive bias into the learned representations for table pre- Training and a schema-aware curriculum learning approach is proposed to mitigate the impact of noise and learn effectively from the pre- training data in an easy-to-hard manner.
An Empirical Exploration of Local Ordering Pre-training for Structured Learning
TLDR
The results show that pre-trained contextual encoders can bring improvements in a structured way, suggesting that they may be able to capture higher-order patterns and feature combinations from unlabeled data.
Rethinking Denoised Auto-Encoding in Language Pre-Training
TLDR
The proposed ContrAstive Pre-Training (CAPT) encourages the consistency between representations of the original sequence and its corrupted version via unsupervised instance-wise training signals, and aids the pre-trained model in better capturing global semantics of the input via more effective sentence-level supervision.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 62 REFERENCES
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
TLDR
A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.
Improving Language Understanding by Generative Pre-Training
TLDR
The general task-agnostic model outperforms discriminatively trained models that use architectures specifically crafted for each task, improving upon the state of the art in 9 out of the 12 tasks studied.
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
TLDR
A benchmark of nine diverse NLU tasks, an auxiliary dataset for probing models for understanding of specific linguistic phenomena, and an online platform for evaluating and comparing models, which favors models that can represent linguistic knowledge in a way that facilitates sample-efficient learning and effective knowledge-transfer across tasks.
Multi-Task Deep Neural Networks for Natural Language Understanding
TLDR
A Multi-Task Deep Neural Network (MT-DNN) for learning representations across multiple natural language understanding (NLU) tasks that allows domain adaptation with substantially fewer in-domain labels than the pre-trained BERT representations.
Sentence Encoders on STILTs: Supplementary Training on Intermediate Labeled-data Tasks
TLDR
The benefits of supplementary training with further training on data-rich supervised tasks, such as natural language inference, obtain additional performance improvements on the GLUE benchmark, as well as observing reduced variance across random restarts in this setting.
XLNet: Generalized Autoregressive Pretraining for Language Understanding
TLDR
XLNet is proposed, a generalized autoregressive pretraining method that enables learning bidirectional contexts by maximizing the expected likelihood over all permutations of the factorization order and overcomes the limitations of BERT thanks to its autore progressive formulation.
Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank
TLDR
A Sentiment Treebank that includes fine grained sentiment labels for 215,154 phrases in the parse trees of 11,855 sentences and presents new challenges for sentiment compositionality, and introduces the Recursive Neural Tensor Network.
Enhancing Pre-Trained Language Representations with Rich Knowledge for Machine Reading Comprehension
TLDR
This work introduces KT-NET, which employs an attention mechanism to adaptively select desired knowledge from KBs, and then fuses selected knowledge with BERT to enable context- and knowledge-aware predictions.
Deep Contextualized Word Representations
TLDR
A new type of deep contextualized word representation is introduced that models both complex characteristics of word use and how these uses vary across linguistic contexts, allowing downstream models to mix different types of semi-supervision signals.
QANet: Combining Local Convolution with Global Self-Attention for Reading Comprehension
TLDR
A new Q\&A architecture called QANet is proposed, which does not require recurrent networks, and its encoder consists exclusively of convolution and self-attention, where convolution models local interactions andSelf-att attention models global interactions.
...
1
2
3
4
5
...