• Corpus ID: 225062397

DICT-MLM: Improved Multilingual Pre-Training using Bilingual Dictionaries

@article{Chaudhary2020DICTMLMIM,
  title={DICT-MLM: Improved Multilingual Pre-Training using Bilingual Dictionaries},
  author={Aditi Chaudhary and Karthik Raman and Krishna Srinivasan and Jiecao Chen},
  journal={ArXiv},
  year={2020},
  volume={abs/2010.12566}
}
Pre-trained multilingual language models such as mBERT have shown immense gains for several natural language processing (NLP) tasks, especially in the zero-shot cross-lingual setting. Most, if not all, of these pre-trained models rely on the masked-language modeling (MLM) objective as the key language learning objective. The principle behind these approaches is that predicting the masked words with the help of the surrounding text helps learn potent contextualized representations. Despite the… 

Figures and Tables from this paper

Multilingual Code-Switching for Zero-Shot Cross-Lingual Intent Prediction and Slot Filling

TLDR
This work proposes a novel method which augments monolingual source data using multilingual code-switching via random translations, to enhance generalizability of large multilingual language models when fine-tuning them for downstream tasks.

Bridging Cross-Lingual Gaps During Leveraging the Multilingual Sequence-to-Sequence Pretraining for Text Generation

TLDR
Experiments show the model outperforms the strong baseline mBART with standard finetuning strategy, consistently, and analyses indicate the approach could narrow the Euclidean distance of cross-lingual sentence representations, and improve the model generalization with trivial computational cost.

Universal Conditional Masked Language Pre-training for Neural Machine Translation

TLDR
This paper proposes CeMAT, a conditional masked language model pre-trained on large-scale bilingual and monolingual corpora in many languages, and is the first work to pre-train a unified model for fine-tuning on both NMT tasks.

CrossAligner & Co: Zero-Shot Transfer Methods for Task-Oriented Cross-lingual Natural Language Understanding

TLDR
This work introduces CrossAligner, the principal method of a variety of effective approaches for zero-shot cross-lingual transfer based on learning alignment from unlabelled parallel data, and presents a quantitative analysis of individual methods as well as their weighted combinations.

Switch Point biased Self-Training: Re-purposing Pretrained Models for Code-Switching

TLDR
This work proposes a self training method to repurpose the existing pretrained models using a switch-point bias by leveraging unannotated data and demonstrates that this approach performs well on both sequence labeling tasks.

When is BERT Multilingual? Isolating Crucial Ingredients for Cross-lingual Transfer

TLDR
The experiments show that the absence of sub-word overlap significantly affects zero-shot transfer when languages differ in their word order, and there is a strong correlation between transfer performance and word embedding alignment between languages.

PARADISE: Exploiting Parallel Data for Multilingual Sequence-to-Sequence Pretraining

TLDR
PARADISE (PARAllel & Denoising Integration in SEquence-to-sequence models), which extends the conventional denoising objective used to train these models by replacing words in the noised sequence according to a multilingual dictionary, and predicting the reference translationaccording to a parallel corpus instead of recovering the original sequence.

PARADISE”:" Exploiting Parallel Data for Multilingual Sequence-to-Sequence Pretraining

TLDR
PARADISE (PARAllel & Denoising Integration in SEquence-to-sequence models), which extends the conventional denoising objective used to train these models by replacing words in the noised sequence according to a multilingual dictionary, and predicting the reference translationaccording to a parallel corpus instead of recovering the original sequence.

On the Impact of Data Augmentation on Downstream Performance in Natural Language Processing

TLDR
Evaluating the impact of 12 data augmentation methods on multiple datasets when finetuning pre-trained language models finds minimal improvements when data sizes are constrained to a few thousand, with performance degradation when data size is increased.

SJ_AJ@DravidianLangTech-EACL2021: Task-Adaptive Pre-Training of Multilingual BERT models for Offensive Language Identification

TLDR
This final system is an ensemble of mBERT and XLM-RoBERTa models which leverage task-adaptive pre-training of multilingual BERT models with a masked language modeling objective and was ranked 1st for Kannada, 2nd for Malayalam and 3rd for Tamil.

References

SHOWING 1-10 OF 27 REFERENCES

FILTER: An Enhanced Fusion Method for Cross-lingual Language Understanding

TLDR
FILTER is proposed, an enhanced fusion method that takes cross-lingual data as input for XLM finetuning and proposes an additional KL-divergence self-teaching loss for model training, based on auto-generated soft pseudo-labels for translated text in the target language.

Massively Multilingual Sentence Embeddings for Zero-Shot Cross-Lingual Transfer and Beyond

TLDR
An architecture to learn joint multilingual sentence representations for 93 languages, belonging to more than 30 different families and written in 28 different scripts using a single BiLSTM encoder with a shared byte-pair encoding vocabulary for all languages, coupled with an auxiliary decoder and trained on publicly available parallel corpora.

Cross-lingual Language Model Pretraining

TLDR
This work proposes two methods to learn cross-lingual language models (XLMs): one unsupervised that only relies on monolingual data, and one supervised that leverages parallel data with a new cross-lingsual language model objective.

Unsupervised Cross-lingual Representation Learning at Scale

TLDR
It is shown that pretraining multilingual language models at scale leads to significant performance gains for a wide range of cross-lingual transfer tasks, and the possibility of multilingual modeling without sacrificing per-language performance is shown for the first time.

CoSDA-ML: Multi-Lingual Code-Switching Data Augmentation for Zero-Shot Cross-Lingual NLP

TLDR
A data augmentation framework to generate multi-lingual code-switching data to fine-tune mBERT, which encourages model to align representations from source and multiple target languages once by mixing their context information.

Are All Languages Created Equal in Multilingual BERT?

TLDR
This work explores how mBERT performs on a much wider set of languages, focusing on the quality of representation for low-resource languages, measured by within-language performance, and finds that better models for low resource languages require more efficient pretraining techniques or more data.

ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators

TLDR
The contextual representations learned by the proposed replaced token detection pre-training task substantially outperform the ones learned by methods such as BERT and XLNet given the same model size, data, and compute.

Evaluating the Cross-Lingual Effectiveness of Massively Multilingual Neural Machine Translation

TLDR
Evaluating the cross-lingual effectiveness of representations from the encoder of a massively multilingual NMT model on 5 downstream classification and sequence labeling tasks covering a diverse set of over 50 languages shows gains in zero-shot transfer in 4 out of 5 tasks.

Attention-Informed Mixed-Language Training for Zero-shot Cross-lingual Task-oriented Dialogue Systems

TLDR
Attention-Informed Mixed-Language Training (MLT) is introduced, a novel zero-shot adaptation method for cross-lingual task-oriented dialogue systems that leverages very few task-related parallel word pairs to generate code-switching sentences for learning the inter-lingUAL semantics across languages.

XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating Cross-lingual Generalization

TLDR
The Cross-lingual TRansfer Evaluation of Multilingual Encoders XTREME benchmark is introduced, a multi-task benchmark for evaluating the cross-lingually generalization capabilities of multilingual representations across 40 languages and 9 tasks.