BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

@inproceedings{Devlin2019BERTPO,
  title={BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding},
  author={J. Devlin and Ming-Wei Chang and Kenton Lee and Kristina Toutanova},
  booktitle={NAACL-HLT},
  year={2019}
}
We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers. [...] Key Result It obtains new state-of-the-art results on eleven natural language processing tasks, including pushing the GLUE score to 80.5% (7.7% point absolute improvement), MultiNLI accuracy to 86.7% (4.6% absolute improvement), SQuAD v1.1 question answering Test F1 to 93.2 (1.5 point absolute improvement) and SQuAD v2.0 Test F1 to 83.1 (5.1 point absolute improvement).Expand
TRANS-BLSTM: Transformer with Bidirectional LSTM for Language Understanding
Extending Answer Prediction for Deep Bi-directional Transformers
BERT for Question Answering on SQuAD 2 . 0
Unified Language Model Pre-training for Natural Language Understanding and Generation
Incorporating BERT into Neural Machine Translation
BERTSel: Answer Selection with Pre-trained Models
Hierarchical Transformers for Long Document Classification
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 60 REFERENCES
Semi-supervised sequence tagging with bidirectional language models
Attention is All you Need
Dissecting Contextual Word Embeddings: Architecture and Representation
QANet: Combining Local Convolution with Global Self-Attention for Reading Comprehension
Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank
Character-Level Language Modeling with Deeper Self-Attention
Skip-Thought Vectors
...
1
2
3
4
5
...