Language Chameleon: Transformation analysis between languages using Cross-lingual Post-training based on Pre-trained language models

  title={Language Chameleon: Transformation analysis between languages using Cross-lingual Post-training based on Pre-trained language models},
  author={Suhyune Son and Chanjun Park and Jungseob Lee and Midan Shim and Chanhee Lee and Yoonna Jang and Jaehyung Seo and Heu-Jeoung Lim},
As pre-trained language models become more resource-demanding, the inequality between resource-rich languages such as English and resource-scarce languages is worsening. This can be attributed to the fact that the amount of available training data in each language fol-lows the power-law distribution, and most of the languages belong to the long tail of the distribution. Some research areas attempt to mitigate this problem. For example, in cross-lingual transfer learning and multilingual… 



Exploring the Data Efficiency of Cross-Lingual Post-Training in Pretrained Language Models

Quantitative results from intrinsic and extrinsic evaluations show that the novel cross-lingual post-training approach outperforms several massively multilingual and monolingual pretrained language models in most settings and improves the data efficiency by a factor of up to 32 compared tomonolingual training.

Unicoder: A Universal Language Encoder by Pre-training with Multiple Cross-lingual Tasks

It is found that doing fine-tuning on multiple languages together can bring further improvement in Unicoder, a universal language encoder that is insensitive to different languages.

Cross-lingual Language Model Pretraining

This work proposes two methods to learn cross-lingual language models (XLMs): one unsupervised that only relies on monolingual data, and one supervised that leverages parallel data with a new cross-lingsual language model objective.

Unsupervised Cross-lingual Representation Learning at Scale

It is shown that pretraining multilingual language models at scale leads to significant performance gains for a wide range of cross-lingual transfer tasks, and the possibility of multilingual modeling without sacrificing per-language performance is shown for the first time.

On the Cross-lingual Transferability of Monolingual Representations

This work designs an alternative approach that transfers a monolingual model to new languages at the lexical level and shows that it is competitive with multilingual BERT on standard cross-lingUAL classification benchmarks and on a new Cross-lingual Question Answering Dataset (XQuAD).

Adapting Monolingual Models: Data can be Scarce when Language Similarity is High

This work retrain the lexical layers of four BERT-based models using data from two low-resource target language varieties, while the Transformer layers are independently fine-tuned on a POS-tagging task in the model's source language.

Cross-Lingual Ability of Multilingual BERT: An Empirical Study

A comprehensive study of the contribution of different components in M-BERT to its cross-lingual ability, finding that the lexical overlap between languages plays a negligible role, while the depth of the network is an integral part of it.

MAD-X: An Adapter-based Framework for Multi-task Cross-lingual Transfer

MAD-X is proposed, an adapter-based framework that enables high portability and parameter-efficient transfer to arbitrary tasks and languages by learning modular language and task representations and introduces a novel invertible adapter architecture and a strong baseline method for adapting a pretrained multilingual model to a new language.

mLUKE: The Power of Entity Representations in Multilingual Pretrained Language Models

A multilingual language model with 24 languages with entity representations is trained and it is shown that the model consistently outperforms word-based pretrained models in various cross-lingual transfer tasks.

On Difficulties of Cross-Lingual Transfer with Order Differences: A Case Study on Dependency Parsing

Investigating crosslingual transfer and posit that an orderagnostic model will perform better when transferring to distant foreign languages shows that RNN-based architectures transfer well to languages that are close to English, while self-attentive models have better overall cross-lingualtransferability and perform especially well on distant languages.