A Balanced Data Approach for Evaluating Cross-Lingual Transfer: Mapping the Linguistic Blood Bank

  title={A Balanced Data Approach for Evaluating Cross-Lingual Transfer: Mapping the Linguistic Blood Bank},
  author={Daniel Malkin and Tomasz Limisiewicz and Gabriel Stanovsky},
We show that the choice of pretraining languages affects downstream cross-lingual transfer for BERT-based models. We inspect zero-shot performance in balanced data conditions to mitigate data size confounds, classifying pretraining languages that improve downstream performance as donors, and languages that are improved in zero-shot performance as recipients. We develop a method of quadratic time complexity in the number of languages to estimate these relations, instead of an exponential… 

NLP for Language Varieties of Italy: Challenges and the Path Forward

Italy is characterized by a one-of-a-kind linguistic diversity landscape in Europe, which implicitly encodes local knowledge, cultural traditions, artistic expression, and history of its speakers.



Revisiting the Primacy of English in Zero-shot Cross-lingual Transfer

English is compared against other transfer languages for fine-tuning, and other high-resource languages such as German and Russian often transfer more effectively, especially when the set of target languages is diverse or unknown a priori.

Unsupervised Cross-lingual Representation Learning at Scale

It is shown that pretraining multilingual language models at scale leads to significant performance gains for a wide range of cross-lingual transfer tasks, and the possibility of multilingual modeling without sacrificing per-language performance is shown for the first time.

Language Contamination Explains the Cross-lingual Capabilities of English Pretrained Models

English pretrained language models, which make up the backbone of many modern NLP systems, require huge amounts of unlabeled training data. These models are generally presented as being trained only

Emerging Cross-lingual Structure in Pretrained Language Models

It is shown that transfer is possible even when there is no shared vocabulary across the monolingual corpora and also when the text comes from very different domains, and it is strongly suggested that, much like for non-contextual word embeddings, there are universal latent symmetries in the learned embedding spaces.

Massively Multilingual Transfer for NER

Evaluating on named entity recognition, it is shown that the proposed techniques for modulating the transfer are much more effective than strong baselines, including standard ensembling, and the unsupervised method rivals oracle selection of the single best individual model.

How Multilingual is Multilingual BERT?

It is concluded that M-BERT does create multilingual representations, but that these representations exhibit systematic deficiencies affecting certain language pairs, and that the model can find translation pairs.

XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating Cross-lingual Generalization

The Cross-lingual TRansfer Evaluation of Multilingual Encoders XTREME benchmark is introduced, a multi-task benchmark for evaluating the cross-lingually generalization capabilities of multilingual representations across 40 languages and 9 tasks.

Are All Languages Created Equal in Multilingual BERT?

This work explores how mBERT performs on a much wider set of languages, focusing on the quality of representation for low-resource languages, measured by within-language performance, and finds that better models for low resource languages require more efficient pretraining techniques or more data.

Cross-Lingual Ability of Multilingual BERT: An Empirical Study

A comprehensive study of the contribution of different components in M-BERT to its cross-lingual ability, finding that the lexical overlap between languages plays a negligible role, while the depth of the network is an integral part of it.

Examining Cross-lingual Contextual Embeddings with Orthogonal Structural Probes

The novel Orthogonal Structural Probe allows us to answer the question for specific linguistic features and learn a projection based only on mono-lingual annotated datasets on whether multilingual embeddings can be aligned in a space shared across many languages.