How Does BERT Answer Questions?: A Layer-Wise Analysis of Transformer Representations

@article{Aken2019HowDB,
  title={How Does BERT Answer Questions?: A Layer-Wise Analysis of Transformer Representations},
  author={Betty van Aken and Benjamin Winter and Alexander L{\"o}ser and F. Gers},
  journal={Proceedings of the 28th ACM International Conference on Information and Knowledge Management},
  year={2019}
}
Bidirectional Encoder Representations from Transformers (BERT) reach state-of-the-art results in a variety of Natural Language Processing tasks. However, understanding of their internal functioning is still insufficient and unsatisfactory. In order to better understand BERT and other Transformer-based models, we present a layer-wise analysis of BERT's hidden states. Unlike previous research, which mainly focuses on explaining Transformer models by their attention weights, we argue that hidden… Expand
33 Citations
What Happens To BERT Embeddings During Fine-tuning?
  • 15
  • PDF
BERTnesia: Investigating the capture and forgetting of knowledge in BERT
  • Highly Influenced
  • PDF
Inserting Information Bottleneck for Attribution in Transformers
Inserting Information Bottlenecks for Attribution in Transformers
  • PDF
Subjective Question Answering: Deciphering the inner workings of Transformers in the realm of subjectivity
  • Highly Influenced
  • PDF
Attention Flows: Analyzing and Comparing Attention Mechanisms in Language Models
  • 2
  • PDF
Modifying Memories in Transformer Models
  • PDF
...
1
2
3
4
...

References

SHOWING 1-3 OF 3 REFERENCES
Attention is All you Need
  • 17,571
  • Highly Influential
  • PDF
What do you learn from context? Probing for sentence structure in contextualized word representations
  • 289
  • Highly Influential
  • PDF
LIII. On lines and planes of closest fit to systems of points in space
  • 8,068
  • Highly Influential
  • PDF