BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

@inproceedings{Devlin2019BERTPO,
  title={BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding},
  author={J. Devlin and Ming-Wei Chang and Kenton Lee and Kristina Toutanova},
  booktitle={NAACL-HLT},
  year={2019}
}
We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers. [...] Key Result It obtains new state-of-the-art results on eleven natural language processing tasks, including pushing the GLUE score to 80.5% (7.7% point absolute improvement), MultiNLI accuracy to 86.7% (4.6% absolute improvement), SQuAD v1.1 question answering Test F1 to 93.2 (1.5 point absolute improvement) and SQuAD v2.0 Test F1 to 83.1 (5.1 point absolute improvement).Expand
16,830 Citations
TRANS-BLSTM: Transformer with Bidirectional LSTM for Language Understanding
  • 4
  • Highly Influenced
  • PDF
Extending Answer Prediction for Deep Bi-directional Transformers
  • 1
  • Highly Influenced
  • PDF
BERT for Question Answering on SQuAD 2 . 0
  • 2
  • Highly Influenced
  • PDF
Utilizing Bidirectional Encoder Representations from Transformers for Answer Selection
  • 4
  • Highly Influenced
  • PDF
Unified Language Model Pre-training for Natural Language Understanding and Generation
  • 373
  • Highly Influenced
  • PDF
Incorporating BERT into Neural Machine Translation
  • 72
  • Highly Influenced
  • PDF
BERTSel: Answer Selection with Pre-trained Models
  • 6
  • Highly Influenced
  • PDF
Hierarchical Transformers for Long Document Classification
  • 27
  • Highly Influenced
  • PDF
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 60 REFERENCES
Semi-supervised sequence tagging with bidirectional language models
  • 381
  • PDF
Attention is All you Need
  • 17,972
  • Highly Influential
  • PDF
Semi-Supervised Sequence Modeling with Cross-View Training
  • 171
  • Highly Influential
  • PDF
Dissecting Contextual Word Embeddings: Architecture and Representation
  • 192
  • Highly Influential
  • PDF
QANet: Combining Local Convolution with Global Self-Attention for Reading Comprehension
  • 543
  • PDF
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
  • 1,349
  • Highly Influential
  • PDF
Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank
  • 4,301
  • PDF
Character-Level Language Modeling with Deeper Self-Attention
  • 142
  • PDF
Skip-Thought Vectors
  • 1,670
  • PDF
...
1
2
3
4
5
...