Analyzing Zero-shot Cross-lingual Transfer in Supervised NLP Tasks

  title={Analyzing Zero-shot Cross-lingual Transfer in Supervised NLP Tasks},
  author={Hyunjin Choi and Judong Kim and Seongho Joe and Seungjai Min and Youngjune Gwon},
  journal={2020 25th International Conference on Pattern Recognition (ICPR)},
In zero-shot cross-lingual transfer, a supervised NLP task trained on a corpus in one language is directly applicable to another language without any additional training. A source of cross-lingual transfer can be as straightforward as lexical overlap between languages (e.g., use of the same scripts, shared subwords) that naturally forces text embeddings to occupy a similar representation space. Recently introduced cross-lingual language model (XLM) pretraining brings out neural parameter… 

Figures and Tables from this paper

Investigating Transfer Learning in Multilingual Pre-trained Language Models through Chinese Natural Language Inference
This work investigates the cross-lingual transfer abilities of XLM-R for Chinese and English natural language inference (NLI), with a focus on the recent largescale Chinese dataset OCNLI.
CABACE: Injecting Character Sequence Information and Domain Knowledge for Enhanced Acronym and Long-Form Extraction
This work proposes a novel framework CABACE: Character-Aware BERT for ACronym Extraction, which takes into account character sequences in text, and is adapted to scientific and legal domains by masked language modelling.


Cross-lingual Language Model Pretraining
This work proposes two methods to learn cross-lingual language models (XLMs): one unsupervised that only relies on monolingual data, and one supervised that leverages parallel data with a new cross-lingsual language model objective.
Unsupervised Cross-lingual Representation Learning at Scale
It is shown that pretraining multilingual language models at scale leads to significant performance gains for a wide range of cross-lingual transfer tasks, and the possibility of multilingual modeling without sacrificing per-language performance is shown for the first time.
Automatic Spanish Translation of the SQuAD Dataset for Multilingual Question Answering
The Translate Align Retrieve (TAR) method is developed to automatically translate the Stanford Question Answering Dataset (SQuAD) v1.1 to Spanish, and this dataset is used to train Spanish QA systems by fine-tuning a Multilingual-BERT model.
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
A benchmark of nine diverse NLU tasks, an auxiliary dataset for probing models for understanding of specific linguistic phenomena, and an online platform for evaluating and comparing models, which favors models that can represent linguistic knowledge in a way that facilitates sample-efficient learning and effective knowledge-transfer across tasks.
How to (Properly) Evaluate Cross-Lingual Word Embeddings: On Strong Baselines, Comparative Analyses, and Some Misconceptions
It is empirically demonstrate that the performance of CLE models largely depends on the task at hand and that optimizing CLE models for BLI may hurt downstream performance, and indicates the most robust supervised and unsupervised CLE models.
SemEval-2017 Task 1: Semantic Textual Similarity Multilingual and Crosslingual Focused Evaluation
The STS Benchmark is introduced as a new shared training and evaluation set carefully selected from the corpus of English STS shared task data (2012-2017), providing insight into the limitations of existing models.
SemEval-2014 Task 10: Multilingual Semantic Textual Similarity
This year, the participants were challenged with new data sets for English, as well as the introduction of Spanish, as a new language in which to assess semantic similarity, and the annotations for both tasks leveraged crowdsourcing.
Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
Sentence-BERT (SBERT), a modification of the pretrained BERT network that use siamese and triplet network structures to derive semantically meaningful sentence embeddings that can be compared using cosine-similarity is presented.
KorNLI and KorSTS: New Benchmark Datasets for Korean Natural Language Understanding
New datasets for Korean NLI and STS are constructed and released, dubbed KorNLI and KorSTS, respectively, following previous approaches, which machine-translate existing English training sets and manually translate development and test sets into Korean.
Attention is All you Need
A new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely is proposed, which generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.