Investigating Transfer Learning in Multilingual Pre-trained Language Models through Chinese Natural Language Inference

  title={Investigating Transfer Learning in Multilingual Pre-trained Language Models through Chinese Natural Language Inference},
  author={Hai Hu and He Zhou and Zuoyu Tian and Yiwen Zhang and Yina Ma and Yanting Li and Yixin Nie and Kyle Richardson},
Multilingual transformers (XLM, mT5) have been shown to have remarkable transfer skills in zero-shot settings. Most transfer studies, however, rely on automatically translated resources (XNLI, XQuAD), making it hard to discern the particular linguistic knowledge that is being transferred, and the role of expert annotated monolingual datasets when developing task-specific models. We investigate the cross-lingual transfer abilities of XLM-R for Chinese and English natural language inference (NLI… Expand


Analyzing Zero-shot Cross-lingual Transfer in Supervised NLP Tasks
This paper aims to validate the hypothetically strong cross-lingual transfer properties induced by XLM pretraining, and takes XLM-RoBERTa (XLM-R) in experiments that extend semantic textual similarity (STS), SQuAD and KorQuAD for machine reading comprehension, sentiment analysis, and alignment of sentence embeddings under various cross-lingsual settings. Expand
XNLI: Evaluating Cross-lingual Sentence Representations
This work constructs an evaluation set for XLU by extending the development and test sets of the Multi-Genre Natural Language Inference Corpus to 14 languages, including low-resource languages such as Swahili and Urdu and finds that XNLI represents a practical and challenging evaluation suite and that directly translating the test data yields the best performance among available baselines. Expand
OCNLI: Original Chinese Natural Language Inference
This paper presents the first large-scale NLI dataset for Chinese called the Original Chinese Natural Language Inference dataset (OCNLI), which follows closely the annotation protocol used for MNLI, but creates new strategies for eliciting diverse hypotheses. Expand
Cross-lingual Language Model Pretraining
This work proposes two methods to learn cross-lingual language models (XLMs): one unsupervised that only relies on monolingual data, and one supervised that leverages parallel data with a new cross-lingsual language model objective. Expand
Beto, Bentz, Becas: The Surprising Cross-Lingual Effectiveness of BERT
This paper explores the broader cross-lingual potential of mBERT (multilingual) as a zero shot language transfer model on 5 NLP tasks covering a total of 39 languages from various language families: NLI, document classification, NER, POS tagging, and dependency parsing. Expand
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
A benchmark of nine diverse NLU tasks, an auxiliary dataset for probing models for understanding of specific linguistic phenomena, and an online platform for evaluating and comparing models, which favors models that can represent linguistic knowledge in a way that facilitates sample-efficient learning and effective knowledge-transfer across tasks. Expand
ParsiNLU: A Suite of Language Understanding Challenges for Persian
ParsiNLU is introduced, the first benchmark in Persian language that includes a range of high-level tasks -- Reading Comprehension, Textual Entailment, etc -- and is presented to compare them with human performance, which provides valuable insights into the ability to tackle natural language understanding challenges in Persian. Expand
What the [MASK]? Making Sense of Language-Specific BERT Models
The current state of the art in language-specific BERT models is presented, providing an overall picture with respect to different dimensions (i.e. architectures, data domains, and tasks), and an immediate and straightforward overview of the commonalities and differences are provided. Expand
Unsupervised Cross-lingual Representation Learning at Scale
It is shown that pretraining multilingual language models at scale leads to significant performance gains for a wide range of cross-lingual transfer tasks, and the possibility of multilingual modeling without sacrificing per-language performance is shown for the first time. Expand
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks. Expand