Revealing the Dark Secrets of BERT

@article{Kovaleva2019RevealingTD,
  title={Revealing the Dark Secrets of BERT},
  author={Olga Kovaleva and Alexey Romanov and Anna Rogers and Anna Rumshisky},
  journal={ArXiv},
  year={2019},
  volume={abs/1908.08593}
}
BERT-based architectures currently give state-of-the-art performance on many NLP tasks, but little is known about the exact mechanisms that contribute to its success. In the current work, we focus on the interpretation of self-attention, which is one of the fundamental underlying components of BERT. Using a subset of GLUE tasks and a set of handcrafted features-of-interest, we propose the methodology and carry out a qualitative and quantitative analysis of the information encoded by the… Expand
158 Citations
A Primer in BERTology: What We Know About How BERT Works
  • 193
  • PDF
Emergent Properties of Finetuned Language Representation Models
  • 1
  • PDF
Investigating Learning Dynamics of BERT Fine-Tuning
  • 1
  • PDF
Poor Man's BERT: Smaller and Faster Transformer Models
  • 28
  • PDF
Pruning a BERT-based Question Answering Model
  • 26
  • PDF
FastBERT: a Self-distilling BERT with Adaptive Inference Time
  • 37
  • PDF
BERTnesia: Investigating the capture and forgetting of knowledge in BERT
  • PDF
The heads hypothesis: A unifying statistical approach towards understanding multi-headed attention in BERT
  • Highly Influenced
  • PDF
On the weak link between importance and prunability of attention heads
  • 1
  • PDF
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 29 REFERENCES
Are Sixteen Heads Really Better than One?
  • 196
  • PDF
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
  • 17,065
  • PDF
Attention is All you Need
  • 18,174
  • PDF
Pay Less Attention with Lightweight and Dynamic Convolutions
  • 243
  • PDF
Rethinking Complex Neural Network Architectures for Document Classification
  • 28
  • PDF
How transferable are features in deep neural networks?
  • 4,613
  • PDF
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
  • 1,365
  • PDF
...
1
2
3
...