Corpus ID: 235683093

Revisiting the Primacy of English in Zero-shot Cross-lingual Transfer

  title={Revisiting the Primacy of English in Zero-shot Cross-lingual Transfer},
  author={Iulia Turc and Kenton Lee and Jacob Eisenstein and Ming-Wei Chang and Kristina Toutanova},
Despite their success, large pre-trained multilingual models have not completely alleviated the need for labeled data, which is cumbersome to collect for all target languages. Zero-shot cross-lingual transfer is emerging as a practical solution: pre-trained models later fine-tuned on one transfer language exhibit surprising performance when tested on many target languages. English is the dominant source language for transfer, as reinforced by popular zero-shot benchmarks. However, this default… Expand

Figures and Tables from this paper

Predicting the Performance of Multilingual NLP Models
This paper proposes an alternate solution for evaluating a model across languages which make use of the existing performance scores of the model on languages that a particular task has test sets for. Expand


From Zero to Hero: On the Limitations of Zero-Shot Language Transfer with Multilingual Transformers
It is demonstrated that the inexpensive few-shot transfer (i.e., additional fine-tuning on a few target-language instances) is surprisingly effective across the board, warranting more research efforts reaching beyond the limiting zero-shot conditions. Expand
Beto, Bentz, Becas: The Surprising Cross-Lingual Effectiveness of BERT
This paper explores the broader cross-lingual potential of mBERT (multilingual) as a zero shot language transfer model on 5 NLP tasks covering a total of 39 languages from various language families: NLI, document classification, NER, POS tagging, and dependency parsing. Expand
Choosing Transfer Languages for Cross-Lingual Learning
This paper considers the task of automatically selecting optimal transfer languages as a ranking problem, and builds models that consider the aforementioned features to perform this prediction, and demonstrates that this model predicts good transfer languages much better than ad hoc baselines considering single features in isolation. Expand
XNLI: Evaluating Cross-lingual Sentence Representations
This work constructs an evaluation set for XLU by extending the development and test sets of the Multi-Genre Natural Language Inference Corpus to 14 languages, including low-resource languages such as Swahili and Urdu and finds that XNLI represents a practical and challenging evaluation suite and that directly translating the test data yields the best performance among available baselines. Expand
Unsupervised Cross-lingual Representation Learning at Scale
It is shown that pretraining multilingual language models at scale leads to significant performance gains for a wide range of cross-lingual transfer tasks, and the possibility of multilingual modeling without sacrificing per-language performance is shown for the first time. Expand
Pivot-based Transfer Learning for Neural Machine Translation between Non-English Languages
We present effective pre-training strategies for neural machine translation (NMT) using parallel corpora involving a pivot language, i.e., source-pivot and pivot-target, leading to a significantExpand
LAReQA: Language-agnostic Answer Retrieval from a Multilingual Pool
It is found that augmenting training data via machine translation is effective, and improves significantly over using mBERT out-of-the-box, and underscores the claim that languageagnostic retrieval is a substantively new kind of cross-lingual evaluation. Expand
Cross-Lingual Ability of Multilingual BERT: An Empirical Study
A comprehensive study of the contribution of different components in M-BERT to its cross-lingual ability, finding that the lexical overlap between languages plays a negligible role, while the depth of the network is an integral part of it. Expand
Leveraging Small Multilingual Corpora for SMT Using Many Pivot Languages
This work is the first of its kind to attempt the simultaneous utilization of 7 pivot languages at decoding time and shows that such pivoting aids in learning of additional phrase pairs which are not learned when the direct sourcetarget corpus is small. Expand
How Multilingual is Multilingual BERT?
It is concluded that M-BERT does create multilingual representations, but that these representations exhibit systematic deficiencies affecting certain language pairs, and that the model can find translation pairs. Expand