XGLUE: A New Benchmark Datasetfor Cross-lingual Pre-training, Understanding and Generation

  title={XGLUE: A New Benchmark Datasetfor Cross-lingual Pre-training, Understanding and Generation},
  author={Yaobo Liang and Nan Duan and Yeyun Gong and Ning Wu and Fenfei Guo and Weizhen Qi and Ming Gong and Linjun Shou and Daxin Jiang and Guihong Cao and Xiaodong Fan and Bruce Zhang and Rahul Agrawal and Edward Cui and Sining Wei and Taroon Bharti and Ying Qiao and Jiun-Hung Chen and Winnie Wu and Shuguang Liu and Fan Yang and Daniel Fernando Campos and Rangan Majumder and Ming Zhou},
  booktitle={Conference on Empirical Methods in Natural Language Processing},
In this paper, we introduce XGLUE, a new benchmark dataset that can be used to train large-scale cross-lingual pre-trained models using multilingual and bilingual corpora and evaluate their performance across a diverse set of cross-lingual tasks. Comparing to GLUE(Wang et al., 2019), which is labeled in English for natural language understanding tasks only, XGLUE has two main advantages: (1) it provides 11 diversified tasks that cover both natural language understanding and generation scenarios… 

IndoNLG: Benchmark and Resources for Evaluating Indonesian Natural Language Generation

In IndoNLG, the first benchmark to measure natural language generation (NLG) progress in three low-resource—yet widely spoken—languages of Indonesia, it is shown that IndoBART and IndoGPT achieve competitive performance on all tasks—despite using only one-fifth the parameters of a larger multilingual model, mBART-large (Liu et al., 2020).

Self-Supervised Augmentation and Generation for Multi-lingual Text Advertisements at Bing

A unified Self-Supervised Augmentation and Generation (SAG) architecture is proposed to handle the multi-lingual text advertisements generation task in a real production scenario and a self-supervised adaptive filtering structure is developed to alleviate the impact of the noise in the augmented data.

Multi2WOZ: A Robust Multilingual Dataset and Conversational Pretraining for Task-Oriented Dialog

A new framework for multilingual conversational specialization of pretrained language models (PrLMs) that aims to facilitate cross-lingual transfer for arbitrary downstream TOD tasks, and shows that the best performance entails the combination of Conversational specialization in the target language and few-shot transfer for the concrete TOD task.

OpenHands: Making Sign Language Recognition Accessible with Pose-based Pretrained Models across Languages

This work establishes that pretraining is effective for sign language recognition by demonstrating (a) improved fine-tuning performance especially in low-resource settings, and (b) high crosslingual transfer from Indian-SL to few other sign languages.

Zero-shot Multi-lingual Interrogative Question Generation for "People Also Ask" at Bing

This work designs a system for supporting multi-lingual QG in the "People Also Ask" (PAA) module for Bing and demonstrates how knowledge transfer in multi-lingsual IQG (Interrogative QG) can be significantly improved using auxiliary tasks either inMulti-task or pre-training task setting.

How Linguistically Fair Are Multilingual Pre-Trained Language Models?

This work scrutinizes the choices made in previous work, proposes a few different strategies for fair and efficient model selection based on the principles of fairness in economics and social choice theory, and emphasizes Rawlsian fairness.

Multilingual Argument Mining: Datasets and Analysis

This work explores the potential of transfer learning using the multilingual BERT model to address argument mining tasks in non-English languages, based on English datasets and the use of machine translation, and focuses on the translate-train approach.

Discovering Representation Sprachbund For Multilingual Pre-Training

This work proposes to generate language representation from multilingual pretrained models and conduct linguistic analysis to show that language representation similarity reflect linguistic similarity from multiple perspectives, including language family, geographical sprachbund, lexicostatistics and syntax.

GLGE: A New General Language Generation Evaluation Benchmark

The General Language Generation Evaluation (GLGE), a new multi-task benchmark for evaluating the generalization capabilities of NLG models across eight language generation tasks, is presented and a leaderboard with strong baselines including MASS, BART, and ProphetNet is built.

Retrofitting Multilingual Sentence Embeddings with Abstract Meaning Representation

Experimental results show that retrofitting multilingual sentence embeddings with AMR leads to better state-of-the-art performance on both semantic textual similarity and transfer tasks.



Unsupervised Cross-lingual Representation Learning at Scale

It is shown that pretraining multilingual language models at scale leads to significant performance gains for a wide range of cross-lingual transfer tasks, and the possibility of multilingual modeling without sacrificing per-language performance is shown for the first time.

Cross-Lingual Natural Language Generation via Pre-Training

Experimental results on question generation and abstractive summarization show that the model outperforms the machine-translation-based pipeline methods for zero-shot cross-lingual generation and improves NLG performance of low-resource languages by leveraging rich-resource language data.

Cross-lingual Language Model Pretraining

This work proposes two methods to learn cross-lingual language models (XLMs): one unsupervised that only relies on monolingual data, and one supervised that leverages parallel data with a new cross-lingsual language model objective.

XNLI: Evaluating Cross-lingual Sentence Representations

This work constructs an evaluation set for XLU by extending the development and test sets of the Multi-Genre Natural Language Inference Corpus to 14 languages, including low-resource languages such as Swahili and Urdu and finds that XNLI represents a practical and challenging evaluation suite and that directly translating the test data yields the best performance among available baselines.

Adam: A Method for Stochastic Optimization

This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.

Ef: Introduction to the conll2002 shared task

  • Proceedings of the 6th Conference on Natural Language Learning.
  • 2002

ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training

A new sequence-to-sequence pre-training model called ProphetNet is presented, which introduces a novel self-supervised objective named future n-gram prediction and the proposed n-stream self-attention mechanism that predicts the next n tokens simultaneously based on previous context tokens at each time step.

BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension

BART is presented, a denoising autoencoder for pretraining sequence-to-sequence models, which matches the performance of RoBERTa on GLUE and SQuAD, and achieves new state-of-the-art results on a range of abstractive dialogue, question answering, and summarization tasks.

MLQA: Evaluating Cross-lingual Extractive Question Answering

This work presents MLQA, a multi-way aligned extractive QA evaluation benchmark intended to spur research in this area, and evaluates state-of-the-art cross-lingual models and machine-translation-based baselines onMLQA.

PAWS-X: A Cross-lingual Adversarial Dataset for Paraphrase Identification

PAWS-X, a new dataset of 23,659 human translated PAWS evaluation pairs in six typologically distinct languages, shows the effectiveness of deep, multilingual pre-training while also leaving considerable headroom as a new challenge to drive multilingual research that better captures structure and contextual information.