BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

@article{Devlin2018BERTPO,
  title={BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding},
  author={Jacob Devlin and Ming-Wei Chang and Kenton Lee and Kristina Toutanova},
  journal={CoRR},
  year={2018},
  volume={abs/1810.04805}
}
We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers. Unlike recent language representation models (Peters et al., 2018; Radford et al., 2018), BERT is designed to pre-train deep bidirectional representations by jointly conditioning on both left and right context in all layers. As a result, the pre-trained BERT representations can be fine-tuned with just one additional output layer to create state-of-theart models… CONTINUE READING

Citations

Publications citing this paper.
SHOWING 1-10 OF 305 CITATIONS, ESTIMATED 42% COVERAGE

FILTER CITATIONS BY YEAR

2018
2019

CITATION STATISTICS

  • 120 Highly Influenced Citations

  • Averaged 48 Citations per year over the last 3 years

References

Publications referenced by this paper.
SHOWING 1-10 OF 40 REFERENCES

Improving language understanding with unsupervised learning

  • Alec Radford, Karthik Narasimhan, Tim Salimans, Ilya Sutskever.
  • Technical report, OpenAI.
  • 2018
Highly Influential
8 Excerpts

Similar Papers

Loading similar papers…