Corpus ID: 231718746

First Align, then Predict: Understanding the Cross-Lingual Ability of Multilingual BERT

  title={First Align, then Predict: Understanding the Cross-Lingual Ability of Multilingual BERT},
  author={B. Muller and Yanai Elazar and B. Sagot and Djam{\'e} Seddah},
  • B. Muller, Yanai Elazar, +1 author Djamé Seddah
  • Published 2021
  • Computer Science
  • ArXiv
  • Multilingual pretrained language models have demonstrated remarkable zero-shot crosslingual transfer capabilities. Such transfer emerges by fine-tuning on a task of interest in one language and evaluating on a distinct language, not seen during the fine-tuning. Despite promising results, we still lack a proper understanding of the source of this transfer. Using a novel layer ablation technique and analyses of the model’s internal representations, we show that multilingual BERT, a popular… CONTINUE READING

    Figures and Tables from this paper


    Emerging Cross-lingual Structure in Pretrained Language Models
    • 45
    • PDF
    Beto, Bentz, Becas: The Surprising Cross-Lingual Effectiveness of BERT
    • 192
    • PDF
    XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating Cross-lingual Generalization
    • 105
    • PDF
    BERT is Not an Interlingua and the Bias of Tokenization
    • 21
    • PDF
    Cross-lingual Language Model Pretraining
    • 743
    • PDF
    Finding Universal Grammatical Relations in Multilingual BERT
    • 22
    • PDF
    BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
    • 14,652
    • Highly Influential
    • PDF
    It's not Greek to mBERT: Inducing Word-Level Translations from Multilingual BERT
    • 2
    • PDF
    What Happens To BERT Embeddings During Fine-tuning?
    • 11
    • PDF
    A Survey of Cross-lingual Word Embedding Models
    • 215
    • Highly Influential
    • PDF