A Primer in BERTology: What We Know About How BERT Works

@article{Rogers2020API,
  title={A Primer in BERTology: What We Know About How BERT Works},
  author={Anna Rogers and Olga Kovaleva and Anna Rumshisky},
  journal={Transactions of the Association for Computational Linguistics},
  year={2020},
  volume={8},
  pages={842-866}
}
Transformer-based models have pushed state of the art in many areas of NLP, but our understanding of what is behind their success is still limited. This paper is the first survey of over 150 studies of the popular BERT model. We review the current state of knowledge about how BERT works, what kind of information it learns and how it is represented, common modifications to its training objectives and architecture, the overparameterization issue, and approaches to compression. We then outline… Expand
177 Citations
When BERT Plays the Lottery, All Tickets Are Winning
  • 16
  • PDF
DynaBERT: Dynamic BERT with Adaptive Width and Depth
  • 19
  • PDF
BERTnesia: Investigating the capture and forgetting of knowledge in BERT
  • PDF
Using Prior Knowledge to Guide BERT's Attention in Semantic Textual Matching Tasks
  • PDF
Improving Task-Agnostic BERT Distillation with Layer Mapping Search
  • 1
  • PDF
Which *BERT? A Survey Organizing Contextualized Encoders
  • 6
  • Highly Influenced
  • PDF
Transferability of Contextual Representations for Question Answering
  • Highly Influenced
  • PDF
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 228 REFERENCES
Revealing the Dark Secrets of BERT
  • 151
  • PDF
When BERT Plays the Lottery, All Tickets Are Winning
  • 16
  • PDF
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
  • 749
  • PDF
Visualizing and Understanding the Effectiveness of BERT
  • 51
  • Highly Influential
  • PDF
Q8BERT: Quantized 8Bit BERT
  • 81
  • PDF
Whatcha lookin' at? DeepLIFTing BERT's Attention in Question Answering
  • 5
  • PDF
RoBERTa: A Robustly Optimized BERT Pretraining Approach
  • 2,804
  • Highly Influential
  • PDF
Are Sixteen Heads Really Better than One?
  • 192
  • Highly Influential
  • PDF
...
1
2
3
4
5
...