UNKs Everywhere: Adapting Multilingual Language Models to New Scripts

@article{Pfeiffer2021UNKsEA,
  title={UNKs Everywhere: Adapting Multilingual Language Models to New Scripts},
  author={Jonas Pfeiffer and Ivan Vulic and Iryna Gurevych and Sebastian Ruder},
  journal={ArXiv},
  year={2021},
  volume={abs/2012.15562}
}
Massively multilingual language models such as multilingual BERT offer state-of-the-art cross-lingual transfer performance on a range of NLP tasks. However, due to limited capacity and large differences in pretraining data sizes, there is a profound performance gap between resource-rich and resource-poor target languages. The ultimate challenge is dealing with under-resourced languages not covered at all by the models and written in scripts unseen during pretraining. In this work, we propose a… 
AmericasNLI: Evaluating Zero-shot Natural Language Understanding of Pretrained Multilingual Models in Truly Low-resource Languages
Pretrained multilingual models are able to perform cross-lingual transfer in a zero-shot setting, even for languages unseen during pretraining. However, prior work evaluating performance on unseen
How Good is Your Tokenizer? On the Monolingual Performance of Multilingual Language Models
TLDR
It is found that replacing the original multilingual tokenizer with the specialized monolingual tokenizer improves the downstream performance of the multilingual model for almost every task and language.
Crossing the Conversational Chasm: A Primer on Multilingual Task-Oriented Dialogue Systems
TLDR
This work identifies two main challenges that combined hinder the faster progress in multilingual TOD: current state-of-the-art TOD models based on large pretrained neural language models are data hungry; at the same time data acquisition for TOD use cases is expensive and tedious.
MasakhaNER: Named Entity Recognition for African Languages
TLDR
This work brings together different stakeholders to create the first large, publicly available, high-quality dataset for named entity recognition (NER) in ten African languages and details the characteristics of these languages to help researchers and practitioners better understand the challenges they pose for NER tasks.
Lifting the Curse of Multilinguality by Pre-training Modular Transformers
TLDR
This work introduces language-specific modules of their Cross-lingual Modular models from the start, which allows them to grow the total capacity of the model, while keeping the total number of trainable parameters per language constant.
TUDa at WMT21: Sentence-Level Direct Assessment with Adapters
TLDR
This work focuses on utilizing massively multilingual language models which only partly cover the target languages during their pre-training phase and extends the model to new languages and unseen scripts using recent adapter-based methods and achieve on par performance or even surpass models pre-trained on the respective languages.
Language Modelling with Pixels
TLDR
PIXEL is a pretrained language model that renders text as images, making it possible to transfer representations across languages based on orthographic similarity or the co-activation of pixels, and is more robust to noisy text inputs than BERT, further confirming the benefits of modelling language with pixels.
Hyper-X: A Unified Hypernetwork for Multi-Task Multilingual Transfer
TLDR
Hyper-X is proposed, a hypernetwork that generates weights for parameter-efficient adapter modules conditioned on both tasks and language embeddings that enables zero-shot transfer for unseen languages and task-language combinations.
Adapting BigScience Multilingual Model to Unseen Languages
We benchmark different strategies of adding new languages (German and Korean) into the BigScience’s pretrained multilingual language model with 1.3 billion parameters that currently supports 13
Specializing Multilingual Language Models: An Empirical Study
TLDR
These evaluations on part-of-speech tagging, universal dependency parsing, and named entity recognition in nine diverse low-resource languages uphold the viability of these approaches while raising new questions around how to optimally adapt multilingual models to low- resource settings.
...
...

References

SHOWING 1-10 OF 52 REFERENCES
When Being Unseen from mBERT is just the Beginning: Handling New Languages With Multilingual Language Models
TLDR
It is shown that transliterating unseen languages significantly improves the potential of large-scale multilingual language models on downstream tasks and provides a promising direction towards making these massively multilingual models useful for a new set of unseen languages.
MAD-X: An Adapter-based Framework for Multi-task Cross-lingual Transfer
TLDR
MAD-X is proposed, an adapter-based framework that enables high portability and parameter-efficient transfer to arbitrary tasks and languages by learning modular language and task representations and introduces a novel invertible adapter architecture and a strong baseline method for adapting a pretrained multilingual model to a new language.
On the Cross-lingual Transferability of Monolingual Representations
TLDR
This work designs an alternative approach that transfers a monolingual model to new languages at the lexical level and shows that it is competitive with multilingual BERT on standard cross-lingUAL classification benchmarks and on a new Cross-lingual Question Answering Dataset (XQuAD).
Emerging Cross-lingual Structure in Pretrained Language Models
TLDR
It is shown that transfer is possible even when there is no shared vocabulary across the monolingual corpora and also when the text comes from very different domains, and it is strongly suggested that, much like for non-contextual word embeddings, there are universal latent symmetries in the learned embedding spaces.
Unsupervised Cross-lingual Representation Learning at Scale
TLDR
It is shown that pretraining multilingual language models at scale leads to significant performance gains for a wide range of cross-lingual transfer tasks, and the possibility of multilingual modeling without sacrificing per-language performance is shown for the first time.
From Zero to Hero: On the Limitations of Zero-Shot Language Transfer with Multilingual Transformers
TLDR
It is demonstrated that the inexpensive few-shot transfer (i.e., additional fine-tuning on a few target-language instances) is surprisingly effective across the board, warranting more research efforts reaching beyond the limiting zero-shot conditions.
Are All Languages Created Equal in Multilingual BERT?
TLDR
This work explores how mBERT performs on a much wider set of languages, focusing on the quality of representation for low-resource languages, measured by within-language performance, and finds that better models for low resource languages require more efficient pretraining techniques or more data.
XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating Cross-lingual Generalization
TLDR
The Cross-lingual TRansfer Evaluation of Multilingual Encoders XTREME benchmark is introduced, a multi-task benchmark for evaluating the cross-lingually generalization capabilities of multilingual representations across 40 languages and 9 tasks.
Rethinking embedding coupling in pre-trained language models
TLDR
The analysis shows that larger output embeddings prevent the model's last layers from overspecializing to the pre-training task and encourage Transformer representations to be more general and more transferable to other tasks and languages.
Extending Multilingual BERT to Low-Resource Languages
TLDR
This paper proposes a simple but effective approach to extend M-BERT E-MBERT so it can benefit any new language, and shows that this approach aids languages that are already in M-berT as well.
...
...