• Corpus ID: 225062397

DICT-MLM: Improved Multilingual Pre-Training using Bilingual Dictionaries

@article{Chaudhary2020DICTMLMIM,
  title={DICT-MLM: Improved Multilingual Pre-Training using Bilingual Dictionaries},
  author={Aditi Chaudhary and Karthik Raman and Krishna Srinivasan and Jiecao Chen},
  journal={ArXiv},
  year={2020},
  volume={abs/2010.12566}
}
Pre-trained multilingual language models such as mBERT have shown immense gains for several natural language processing (NLP) tasks, especially in the zero-shot cross-lingual setting. Most, if not all, of these pre-trained models rely on the masked-language modeling (MLM) objective as the key language learning objective. The principle behind these approaches is that predicting the masked words with the help of the surrounding text helps learn potent contextualized representations. Despite the… 

Figures and Tables from this paper

Multilingual Code-Switching for Zero-Shot Cross-Lingual Intent Prediction and Slot Filling

This work proposes a novel method which augments monolingual source data using multilingual code-switching via random translations, to enhance generalizability of large multilingual language models when fine-tuning them for downstream tasks.

GreenPLM: Cross-lingual pre-trained language models conversion with (almost) no cost

This study proposes an e-ective and energy-e-cient framework GreenPLM that uses bilingual lexicons to directly translate language models of one language into other languages at (almost) no additional cost and outperforms the original monolingual language models in six out of seven tested languages.

Bridging Cross-Lingual Gaps During Leveraging the Multilingual Sequence-to-Sequence Pretraining for Text Generation

Experiments show the model outperforms the strong baseline mBART with standard finetuning strategy, consistently, and analyses indicate the approach could narrow the Euclidean distance of cross-lingual sentence representations, and improve the model generalization with trivial computational cost.

Universal Conditional Masked Language Pre-training for Neural Machine Translation

This paper proposes CeMAT, a conditional masked language model pre-trained on large-scale bilingual and monolingual corpora in many languages, and is the first work to pre-train a unified model for fine-tuning on both NMT tasks.

CrossAligner & Co: Zero-Shot Transfer Methods for Task-Oriented Cross-lingual Natural Language Understanding

This work introduces CrossAligner, the principal method of a variety of effective approaches for zero-shot cross-lingual transfer based on learning alignment from unlabelled parallel data, and presents a quantitative analysis of individual methods as well as their weighted combinations.

Switch Point biased Self-Training: Re-purposing Pretrained Models for Code-Switching

This work proposes a self training method to repurpose the existing pretrained models using a switch-point bias by leveraging unannotated data and demonstrates that this approach performs well on both sequence labeling tasks.

When is BERT Multilingual? Isolating Crucial Ingredients for Cross-lingual Transfer

The experiments show that the absence of sub-word overlap significantly affects zero-shot transfer when languages differ in their word order, and there is a strong correlation between transfer performance and word embedding alignment between languages.

PARADISE: Exploiting Parallel Data for Multilingual Sequence-to-Sequence Pretraining

PARADISE (PARAllel & Denoising Integration in SEquence-to-sequence models), which extends the conventional denoising objective used to train these models by replacing words in the noised sequence according to a multilingual dictionary, and predicting the reference translationaccording to a parallel corpus instead of recovering the original sequence.

PARADISE”:" Exploiting Parallel Data for Multilingual Sequence-to-Sequence Pretraining

PARADISE (PARAllel & Denoising Integration in SEquence-to-sequence models), which extends the conventional denoising objective used to train these models by replacing words in the noised sequence according to a multilingual dictionary, and predicting the reference translationaccording to a parallel corpus instead of recovering the original sequence.

On the Impact of Data Augmentation on Downstream Performance in Natural Language Processing

Evaluating the impact of 12 data augmentation methods on multiple datasets when finetuning pre-trained language models finds minimal improvements when data sizes are constrained to a few thousand, with performance degradation when data size is increased.

References

SHOWING 1-10 OF 27 REFERENCES

Massively Multilingual Sentence Embeddings for Zero-Shot Cross-Lingual Transfer and Beyond

An architecture to learn joint multilingual sentence representations for 93 languages, belonging to more than 30 different families and written in 28 different scripts using a single BiLSTM encoder with a shared byte-pair encoding vocabulary for all languages, coupled with an auxiliary decoder and trained on publicly available parallel corpora.

Cross-lingual Language Model Pretraining

This work proposes two methods to learn cross-lingual language models (XLMs): one unsupervised that only relies on monolingual data, and one supervised that leverages parallel data with a new cross-lingsual language model objective.

Unsupervised Cross-lingual Representation Learning at Scale

It is shown that pretraining multilingual language models at scale leads to significant performance gains for a wide range of cross-lingual transfer tasks, and the possibility of multilingual modeling without sacrificing per-language performance is shown for the first time.

CoSDA-ML: Multi-Lingual Code-Switching Data Augmentation for Zero-Shot Cross-Lingual NLP

A data augmentation framework to generate multi-lingual code-switching data to fine-tune mBERT, which encourages model to align representations from source and multiple target languages once by mixing their context information.

ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators

The contextual representations learned by the proposed replaced token detection pre-training task substantially outperform the ones learned by methods such as BERT and XLNet given the same model size, data, and compute.

Evaluating the Cross-Lingual Effectiveness of Massively Multilingual Neural Machine Translation

Evaluating the cross-lingual effectiveness of representations from the encoder of a massively multilingual NMT model on 5 downstream classification and sequence labeling tasks covering a diverse set of over 50 languages shows gains in zero-shot transfer in 4 out of 5 tasks.

Attention-Informed Mixed-Language Training for Zero-shot Cross-lingual Task-oriented Dialogue Systems

Attention-Informed Mixed-Language Training (MLT) is introduced, a novel zero-shot adaptation method for cross-lingual task-oriented dialogue systems that leverages very few task-related parallel word pairs to generate code-switching sentences for learning the inter-lingUAL semantics across languages.

XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating Cross-lingual Generalization

The Cross-lingual TRansfer Evaluation of Multilingual Encoders XTREME benchmark is introduced, a multi-task benchmark for evaluating the cross-lingually generalization capabilities of multilingual representations across 40 languages and 9 tasks.

Word Translation Without Parallel Data

It is shown that a bilingual dictionary can be built between two languages without using any parallel corpora, by aligning monolingual word embedding spaces in an unsupervised way.

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.