• Corpus ID: 214641214

XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating Cross-lingual Generalization

  title={XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating Cross-lingual Generalization},
  author={Junjie Hu and Sebastian Ruder and Aditya Siddhant and Graham Neubig and Orhan Firat and Melvin Johnson},
Much recent progress in applications of machine learning models to NLP has been driven by benchmarks that evaluate models across a wide variety of tasks. However, these broad-coverage benchmarks have been mostly limited to English, and despite an increasing interest in multilingual models, a benchmark that enables the comprehensive evaluation of such methods on a diverse range of languages and tasks is still missing. To this end, we introduce the Cross-lingual TRansfer Evaluation of… 

Shapley Head Pruning: Identifying and Removing Interference in Multilingual Transformers

This work uses Shapley Values, a credit allocation metric from coalitional game theory, to identify attention heads that introduce interference and shows that removing identified attention heads from a fixed model improves performance for a target language on both sentence classi-cation and structural prediction, seeing gains as large as 24.7%.

Generating Extended and Multilingual Summaries with Pre-trained Transformers

The results show that fine-tuning mT5 on all the languages combined significantly improves the summarisation performance on low-resource languages and the combination of an extractive model with an abstractive one can be used to create extended abstractive summaries from long input documents.

Bilingual alignment transfers to multilingual alignment for unsupervised parallel text mining

Bilingual training techniques as proposed can be applied to get sentence representations with multilingual alignment, and dual-pivot transfer is introduced: training on one language pair and evaluating on other pairs.

Language-agnostic BERT Sentence Embedding

It is shown that introducing a pre-trained multilingual language model dramatically reduces the amount of parallel training data required to achieve good performance by 80%, and a model that achieves 83.7% bi-text retrieval accuracy over 112 languages on Tatoeba is released.

Cross-Lingual Language Model Meta-Pretraining

This paper proposes cross-lingual language model metapretraining, which introduces an additional meta-pretraining phase before cross-lingsual pretraining, where the model learns generalization ability on a largescale monolingual corpus and focuses on learningCrosslingual transfer on a multilingual corpus.

On the Language-specificity of Multilingual BERT and the Impact of Fine-tuning

The results presented here suggest that the process of fine-tuning causes a reorganisation of the model’s limited representational capacity, enhancing language-independent representations at the expense of language-specific ones.

On the Effectiveness of Adapter-based Tuning for Pretrained Language Model Adaptation

It is demonstrated that 1) adapter-based tuning outperforms fine-tuning on low-resource and cross-lingual tasks; 2) it is more robust to overfitting and less sensitive to changes in learning rates.

NewsEmbed: Modeling News through Pre-trained Document Representations

A novel approach to mine semantically-relevant fresh documents, and their topic labels, with little human supervision is proposed, and a multitask model called NewsEmbed is designed that alternatively trains a contrastive learning with a multi-label classification to derive a universal document encoder.

UXLA: A Robust Unsupervised Data Augmentation Framework for Zero-Resource Cross-Lingual NLP

UXLA is proposed, a novel unsupervised data augmentation framework for zero-resource transfer learning scenarios that aims to solve cross-lingual adaptation problems from a source language task distribution to an unknown targetlanguage task distribution, assuming no training label in the target language.

English Intermediate-Task Training Improves Zero-Shot Cross-Lingual Transfer Too

This work evaluates intermediate-task transfer in a zero-shot cross-lingual setting on the XTREME benchmark, and finds MNLI, SQuAD and HellaSwag achieve the best overall results as intermediate tasks, while multi-task intermediate offers small additional improvements.



Word Translation Without Parallel Data

It is shown that a bilingual dictionary can be built between two languages without using any parallel corpora, by aligning monolingual word embedding spaces in an unsupervised way.

How to (Properly) Evaluate Cross-Lingual Word Embeddings: On Strong Baselines, Comparative Analyses, and Some Misconceptions

It is empirically demonstrate that the performance of CLE models largely depends on the task at hand and that optimizing CLE models for BLI may hurt downstream performance, and indicates the most robust supervised and unsupervised CLE models.

Massively Multilingual Transfer for NER

Evaluating on named entity recognition, it is shown that the proposed techniques for modulating the transfer are much more effective than strong baselines, including standard ensembling, and the unsupervised method rivals oracle selection of the single best individual model.

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.

A Corpus for Multilingual Document Classification in Eight Languages

A new subset of the Reuters corpus with balanced class priors for eight languages is proposed, adding Italian, Russian, Japanese and Chinese, which provides strong baselines for all language transfer directions using multilingual word and sentence embeddings respectively.

Learning bilingual word embeddings with (almost) no bilingual data

This work further reduces the need of bilingual resources using a very simple self-learning approach that can be combined with any dictionary-based mapping technique, and works with as little bilingual evidence as a 25 word dictionary or even an automatically generated list of numerals.

Baselines and Test Data for Cross-Lingual Inference

This paper proposes to advance the research in SNLI-style natural language inference toward multilingual evaluation and provides test data for four major languages: Arabic, French, Spanish, and Russian, based on cross-lingual word embeddings and machine translation.

Exploiting Similarities among Languages for Machine Translation

This method can translate missing word and phrase entries by learning language structures based on large monolingual data and mapping between languages from small bilingual data and uses distributed representation of words and learns a linear mapping between vector spaces of languages.

TyDi QA: A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages

A quantitative analysis of the data quality and example-level qualitative linguistic analyses of observed language phenomena that would not be found in English-only corpora are presented.