• Computer Science
  • Published in NAACL-HLT 2019

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

@inproceedings{Devlin2019BERTPO,
  title={BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding},
  author={Jacob Devlin and Ming-Wei Chang and Kenton Lee and Kristina Toutanova},
  booktitle={NAACL-HLT},
  year={2019}
}
We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers. Unlike recent language representation models, BERT is designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers. As a result, the pre-trained BERT model can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks, such… CONTINUE READING

Citations

Publications citing this paper.
SHOWING 1-10 OF 3,755 CITATIONS

A MUTUAL INFORMATION MAXIMIZATION PERSPEC-

  • 2019
VIEW 6 EXCERPTS
CITES METHODS & BACKGROUND
HIGHLY INFLUENCED

A Mutual Information Maximization Perspective of Language Representation Learning

VIEW 6 EXCERPTS
CITES METHODS & BACKGROUND
HIGHLY INFLUENCED

BERT and PALs: Projected Attention Layers for Efficient Adaptation in Multi-Task Learning

VIEW 13 EXCERPTS
CITES METHODS & BACKGROUND
HIGHLY INFLUENCED

Contextual Grounding of Natural Language Entities in Images

VIEW 10 EXCERPTS
CITES METHODS & BACKGROUND
HIGHLY INFLUENCED

How Can We Know What Language Models Know?

VIEW 4 EXCERPTS
CITES BACKGROUND & METHODS
HIGHLY INFLUENCED

"Why is 'Chicago' deceptive?" Towards Building Model-Driven Tutorials for Humans

VIEW 18 EXCERPTS
CITES BACKGROUND & METHODS
HIGHLY INFLUENCED

FILTER CITATIONS BY YEAR

2017
2020

CITATION STATISTICS

  • 1,413 Highly Influenced Citations

  • Averaged 1,139 Citations per year from 2017 through 2019

  • 4,542% Increase in citations per year in 2019 over 2018

References

Publications referenced by this paper.
SHOWING 1-10 OF 41 REFERENCES

Deep contextualized word representations

VIEW 12 EXCERPTS

GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding

VIEW 6 EXCERPTS
HIGHLY INFLUENTIAL

Improving language understanding with unsupervised learning

  • Alec Radford, Karthik Narasimhan, Tim Salimans, Ilya Sutskever.
  • Technical report, OpenAI.
  • 2018
VIEW 8 EXCERPTS
HIGHLY INFLUENTIAL

Attention is All you Need

VIEW 4 EXCERPTS
HIGHLY INFLUENTIAL