BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

@inproceedings{Devlin2019BERTPO,
  title={BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding},
  author={J. Devlin and Ming-Wei Chang and Kenton Lee and Kristina Toutanova},
  booktitle={NAACL-HLT},
  year={2019}
}
  • J. Devlin, Ming-Wei Chang, +1 author Kristina Toutanova
  • Published in NAACL-HLT 2019
  • Computer Science
  • We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers. [...] Key Result It obtains new state-of-the-art results on eleven natural language processing tasks, including pushing the GLUE score to 80.5% (7.7% point absolute improvement), MultiNLI accuracy to 86.7% (4.6% absolute improvement), SQuAD v1.1 question answering Test F1 to 93.2 (1.5 point absolute improvement) and SQuAD v2.0 Test F1 to 83.1 (5.1 point absolute improvement).Expand Abstract
    11,718 Citations
    Span Selection Pre-training for Question Answering
    • 9
    • Highly Influenced
    • PDF
    Extending Answer Prediction for Deep Bi-directional Transformers
    • 1
    • Highly Influenced
    • PDF
    BERT for Question Answering on SQuAD 2 . 0
    • 1
    • Highly Influenced
    • PDF
    TRANS-BLSTM: Transformer with Bidirectional LSTM for Language Understanding
    • 2
    • Highly Influenced
    • PDF
    Utilizing Bidirectional Encoder Representations from Transformers for Answer Selection
    • 2
    • Highly Influenced
    • PDF
    BERTSel: Answer Selection with Pre-trained Models
    • 4
    • Highly Influenced
    • PDF
    Incorporating BERT into Neural Machine Translation
    • 38
    • Highly Influenced
    • PDF
    Unified Language Model Pre-training for Natural Language Understanding and Generation
    • 266
    • Highly Influenced
    • PDF
    Improving SQUAD 2 . 0 Performance using BERT + X
    • Danny Takeuchi
    • 2019
    • 1
    • PDF

    References

    SHOWING 1-10 OF 60 REFERENCES
    Attention is All you Need
    • 13,199
    • Highly Influential
    • PDF
    Semi-supervised sequence tagging with bidirectional language models
    • 346
    • PDF
    Semi-Supervised Sequence Modeling with Cross-View Training
    • 142
    • Highly Influential
    • PDF
    Dissecting Contextual Word Embeddings: Architecture and Representation
    • 162
    • Highly Influential
    • PDF
    QANet: Combining Local Convolution with Global Self-Attention for Reading Comprehension
    • 468
    • PDF
    GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
    • 995
    • Highly Influential
    • PDF
    Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank
    • 3,968
    • PDF
    Character-Level Language Modeling with Deeper Self-Attention
    • 113
    • PDF
    Skip-Thought Vectors
    • 1,553
    • PDF