• Corpus ID: 235727612

A Primer on Pretrained Multilingual Language Models

@article{Doddapaneni2021APO,
  title={A Primer on Pretrained Multilingual Language Models},
  author={Sumanth Doddapaneni and Gowtham Ramesh and Anoop Kunchukuttan and Pratyush Kumar and Mitesh M. Khapra},
  journal={ArXiv},
  year={2021},
  volume={abs/2107.00676}
}
Multilingual Language Models (MLLMs) such as mBERT, XLM, XLM-R, etc. have emerged as a viable option for bringing the power of pretraining to a large number of languages. Given their success in zero-shot transfer learning, there has emerged a large body of work in (i) building bigger MLLMs covering a large number of languages (ii) creating exhaustive benchmarks covering a wider variety of tasks and languages for evaluating MLLMs (iii) analysing the performance of MLLMs on monolingual, zero-shot… 

Figures and Tables from this paper

mGPT: Few-Shot Learners Go Multilingual
TLDR
This paper introduces two autoregressive GPT-like models with 1.3 billion and 13 billion parameters trained on 60 languages from 25 language families using Wikipedia and Colossal Clean Crawled Corpus, and trains small versions of the model to choose the most optimal multilingual tokenization strategy.
When is BERT Multilingual? Isolating Crucial Ingredients for Cross-lingual Transfer
TLDR
The experiments show that the absence of sub-word overlap significantly affects zero-shot transfer when languages differ in their word order, and there is a strong correlation between transfer performance and word embedding alignment between languages.
Multilingual Transformer Encoders: a Word-Level Task-Agnostic Evaluation
TLDR
This work proposes a word-level task-agnostic method to evaluate the alignment of contextualized representations built by transformer-based models and shows that this method provides more accurate translated word pairs than previous methods to evaluate word- level alignment.
Multilingualism Encourages Recursion: a Transfer Study with mBERT
TLDR
An attempt to investigate the relational structures learnt by mBERT, a multilingual transformer-based network, with respect to different cross-linguistic regularities proposed in the fields of theoretical and quantitative linguistics by relying on a zero-shot transfer experiment and comparing its performance to the output of BERT.
mLUKE: The Power of Entity Representations in Multilingual Pretrained Language Models
TLDR
A multilingual language model with 24 languages with entity representations is trained and it is shown that the model consistently outperforms word-based pretrained models in various cross-lingual transfer tasks.
Improving Word Translation via Two-Stage Contrastive Learning
TLDR
This work proposes a robust and effective two-stage contrastive learning framework for the BLI task, and proposes to refine standard cross-lingual linear maps between static word embeddings (WEs) via a Contrastive learning objective.
The Impact of Cross-Lingual Adjustment of Contextual Word Representations on Zero-Shot Transfer
TLDR
Analysis of distances between contextualized embeddings of related and un- 012 related words across languages showed that fine-tuning leads to “foregetting” some of the cross-lingual alignment information, which can negatively affect the effec- 016 tiveness of the zero-shot transfer.
Pretraining with Artificial Language: Studying Transferable Knowledge in Language Models
TLDR
The experimental results show that pretraining with an artificial language with a nesting dependency structure provides some knowledge transferable to natural language, and a follow-up probing analysis indicates that its success in the transfer is related to the amount of encoded contextual information.
EENLP: Cross-lingual Eastern European NLP Index
TLDR
A broad index of NLP resources for Eastern European languages, which, it is hoped, could be helpful for the NLP community; several new hand-crafted cross-lingual datasets focused on Eastern Europe languages, and a sketch evaluation of cross-lingsual transfer learning abilities of several modern multilingual Transformer-based models.
CALCS 2021 Shared Task: Machine Translation for Code-Switched Data
TLDR
This paper addresses machine translation for code-switched social media data with baselines for all language pairs in a community shared task and shares insights and challenges in curating the "into" code- Switched language evaluation data.
...
...

References

SHOWING 1-10 OF 155 REFERENCES
UNKs Everywhere: Adapting Multilingual Language Models to New Scripts
TLDR
This work proposes a series of novel data-efficient methods that enable quick and effective adaptation of pretrained multilingual models to such low-resource languages and unseen scripts and demonstrates that they can yield improvements for low- resource languages written in scripts covered by the pretrained model.
When Being Unseen from mBERT is just the Beginning: Handling New Languages With Multilingual Language Models
TLDR
It is shown that transliterating unseen languages significantly improves the potential of large-scale multilingual language models on downstream tasks and provides a promising direction towards making these massively multilingual models useful for a new set of unseen languages.
On Learning Universal Representations Across Languages
TLDR
Hierarchical Contrastive Learning (HiCTL) is proposed to learn universal representations for parallel sentences distributed in one or multiple languages and distinguish the semantically-related words from a shared cross-lingual vocabulary for each sentence.
MergeDistill: Merging Pre-trained Language Models using Distillation
TLDR
MEGEDISTILL is proposed, a framework to merge pre-trained LMs in a way that can best leverage their assets with minimal dependencies, using task-agnostic knowledge distillation and the applicability of the framework in a practical setting is demonstrated.
Inducing Language-Agnostic Multilingual Representations
TLDR
Three approaches for removing language identity signals from multilingual embeddings are examined: re-aligning the vector spaces of target languages (all together) to a pivot source language, removing language-specific means and variances, and increasing input similarity across languages by removing morphological contractions and sentence reordering.
Unsupervised Cross-lingual Representation Learning at Scale
TLDR
It is shown that pretraining multilingual language models at scale leads to significant performance gains for a wide range of cross-lingual transfer tasks, and the possibility of multilingual modeling without sacrificing per-language performance is shown for the first time.
Multilingual is not enough: BERT for Finnish
TLDR
While the multilingual model largely fails to reach the performance of previously proposed methods, the custom Finnish BERT model establishes new state-of-the-art results on all corpora for all reference tasks: part- of-speech tagging, named entity recognition, and dependency parsing.
Explicit Alignment Objectives for Multilingual Bidirectional Encoders
TLDR
A new method for learning multilingual encoders, AMBER (Aligned Multilingual Bidirectional EncodeR), trained on additional parallel data using two explicit alignment objectives that align the multilingual representations at different granularities is presented.
On the Cross-lingual Transferability of Monolingual Representations
TLDR
This work designs an alternative approach that transfers a monolingual model to new languages at the lexical level and shows that it is competitive with multilingual BERT on standard cross-lingUAL classification benchmarks and on a new Cross-lingual Question Answering Dataset (XQuAD).
Are All Languages Created Equal in Multilingual BERT?
TLDR
This work explores how mBERT performs on a much wider set of languages, focusing on the quality of representation for low-resource languages, measured by within-language performance, and finds that better models for low resource languages require more efficient pretraining techniques or more data.
...
...