On Efficiently Acquiring Annotations for Multilingual Models

  title={On Efficiently Acquiring Annotations for Multilingual Models},
  author={Joel Ruben Antony Moniz and Barun Patra and Matthew R. Gormley},
When tasked with supporting multiple languages for a given problem, two approaches have arisen: training a model for each language with the annotation budget divided equally among them, and training on a high-resource language followed by zero-shot transfer to the remaining languages. In this work, we show that the strategy of joint learning across multiple languages using a single model performs substantially better than the aforementioned alternatives. We also demonstrate that active learning… 


75 Languages, 1 Model: Parsing Universal Dependencies Universally
It is found that fine-tuning a multilingual BERT self-attention model pretrained on 104 languages can meet or exceed state-of-the-art UPOS, UFeats, Lemmas, (and especially) UAS, and LAS scores, without requiring any recurrent or language-specific components.
Unsupervised Cross-lingual Representation Learning at Scale
It is shown that pretraining multilingual language models at scale leads to significant performance gains for a wide range of cross-lingual transfer tasks, and the possibility of multilingual modeling without sacrificing per-language performance is shown for the first time.
From Zero to Hero: On the Limitations of Zero-Shot Language Transfer with Multilingual Transformers
It is demonstrated that the inexpensive few-shot transfer (i.e., additional fine-tuning on a few target-language instances) is surprisingly effective across the board, warranting more research efforts reaching beyond the limiting zero-shot conditions.
Multilingual Denoising Pre-training for Neural Machine Translation
Abstract This paper demonstrates that multilingual denoising pre-training produces significant performance gains across a wide variety of machine translation (MT) tasks. We present mBART—a
Active Learning for Dependency Parsing with Partial Annotation
The first to apply a probabilistic model to active learning for dependency parsing, which can provide tree probabilities and dependency marginal probabilities as principled uncertainty metrics, and directly learn parameters from PA based on a forest-based training objective.
Bilingual Active Learning for Relation Classification via Pseudo Parallel Corpora
Experimental results on the ACE RDC 2005 Chinese and English corpora show that bilingual active learning for relation classification significantly outperforms monolingual active learning.
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.
mT5: A Massively Multilingual Pre-trained Text-to-Text Transformer
The recent “Text-to-Text Transfer Transformer” (T5) leveraged a unified text-to-text format and scale to attain state-of-the-art results on a wide variety of English-language NLP tasks. In this
Transformers: State-of-the-Art Natural Language Processing
Transformers is an open-source library that consists of carefully engineered state-of-the art Transformer architectures under a unified API and a curated collection of pretrained models made by and available for the community.
Cross-Language Text Classification Using Structural Correspondence Learning
We present a new approach to cross-language text classification that builds on structural correspondence learning, a recently proposed theory for domain adaptation. The approach uses unlabeled