BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

@inproceedings{Devlin2018BERTPO,
  title={BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding},
  author={Jacob Devlin and Ming-Wei Chang and Kenton Lee and Kristina Toutanova},
  booktitle={NAACL-HLT},
  year={2018}
}
We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers. Unlike recent language representation models (Peters et al., 2018a; Radford et al., 2018), BERT is designed to pretrain deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers. As a result, the pre-trained BERT model can be finetuned with just one additional output layer to create state-of-the… CONTINUE READING

Figures, Tables, Results, and Topics from this paper.

Key Quantitative Results

  • It obtains new state-of-the-art results on eleven natural language processing tasks, including pushing the GLUE score to 80.5% (7.7% point absolute improvement), MultiNLI accuracy to 86.7% (4.6% absolute improvement), SQuAD v1.1 question answering Test F1 to 93.2 (1.5 point absolute improvement) and SQuAD v2.0 Test F1 to 83.1 (5.1 point absolute improvement).

Citations

Publications citing this paper.
SHOWING 1-10 OF 466 CITATIONS, ESTIMATED 44% COVERAGE

BAM ! Born-Again MultiTask Networks for Natural Language Understanding

  • 2019
VIEW 5 EXCERPTS
CITES METHODS & BACKGROUND
HIGHLY INFLUENCED

Unbabel's Submission to the WMT2019 APE Shared Task: BERT-based Encoder-Decoder for Automatic Post-Editing

António V. Lopes, M. Amin Farajian, Gonçalo M. Correia, Jonay Trénous, André F. T. Martins
  • ArXiv
  • 2019
VIEW 4 EXCERPTS
CITES METHODS & BACKGROUND
HIGHLY INFLUENCED

75 Languages, 1 Model: Parsing Universal Dependencies Universally

VIEW 5 EXCERPTS
CITES METHODS & BACKGROUND
HIGHLY INFLUENCED

A Lightweight Recurrent Network for Sequence Modeling

Biao Zhang, Rico Sennrich
  • ArXiv
  • 2019
VIEW 5 EXCERPTS
HIGHLY INFLUENCED

Beto, Bentz, Becas: The Surprising Cross-Lingual Effectiveness of BERT

VIEW 7 EXCERPTS
CITES BACKGROUND & METHODS
HIGHLY INFLUENCED

Cross-lingual Language Model Pretraining

VIEW 16 EXCERPTS
CITES RESULTS, BACKGROUND & METHODS
HIGHLY INFLUENCED

DROP: A Reading Comprehension Benchmark Requiring Discrete Reasoning Over Paragraphs

  • NAACL-HLT
  • 2019
VIEW 9 EXCERPTS
CITES METHODS & BACKGROUND
HIGHLY INFLUENCED

FastFusionNet: New State-of-the-Art for DAWNBench SQuAD

  • ArXiv
  • 2019
VIEW 5 EXCERPTS
CITES METHODS & BACKGROUND
HIGHLY INFLUENCED

FILTER CITATIONS BY YEAR

2018
2019

CITATION STATISTICS

  • 178 Highly Influenced Citations

  • Averaged 51 Citations per year over the last 3 years

References

Publications referenced by this paper.
SHOWING 1-10 OF 54 REFERENCES

Deep contextualized word representations

VIEW 16 EXCERPTS
HIGHLY INFLUENTIAL

Improving language understanding with unsupervised learning

Alec Radford, Karthik Narasimhan, Tim Salimans, Ilya Sutskever.
  • Technical report, OpenAI.
  • 2018
VIEW 10 EXCERPTS
HIGHLY INFLUENTIAL

Attention Is All You Need

VIEW 4 EXCERPTS
HIGHLY INFLUENTIAL

Similar Papers

Loading similar papers…