Corpus ID: 237532187

Locating Language-Specific Information in Contextualized Embeddings

@article{Liang2021LocatingLI,
  title={Locating Language-Specific Information in Contextualized Embeddings},
  author={Sheng Liang and Philipp Dufter and Hinrich Sch{\"u}tze},
  journal={ArXiv},
  year={2021},
  volume={abs/2109.08040}
}
Multilingual pretrained language models (MPLMs) exhibit multilinguality and are well suited for transfer across languages. Most MPLMs are trained in an unsupervised fashion and the relationship between their objective and multilinguality is unclear. More specifically, the question whether MPLM representations are language-agnostic or they simply interleave well with learned task prediction heads arises. In this work, we locate language-specific information in MPLMs and identify its… Expand

Figures and Tables from this paper

References

SHOWING 1-10 OF 14 REFERENCES
What does it mean to be language-agnostic? Probing multilingual sentence encoders for typological properties
TLDR
This work proposes methods for probing sentence representations from state-of-the-art multilingual encoders with respect to a range of typological properties pertaining to lexical, morphological and syntactic structure and shows interesting differences in encoding linguistic variation associated with different pretraining strategies. Expand
Emerging Cross-lingual Structure in Pretrained Language Models
TLDR
It is shown that transfer is possible even when there is no shared vocabulary across the monolingual corpora and also when the text comes from very different domains, and it is strongly suggested that, much like for non-contextual word embeddings, there are universal latent symmetries in the learned embedding spaces. Expand
It’s not Greek to mBERT: Inducing Word-Level Translations from Multilingual BERT
TLDR
The hypothesis that multilingual BERT learns representations which contain both a language-encoding component and an abstract, cross-lingual component is tested, and an empirical language-identity subspace within mBERT representations is identified. Expand
Beto, Bentz, Becas: The Surprising Cross-Lingual Effectiveness of BERT
TLDR
This paper explores the broader cross-lingual potential of mBERT (multilingual) as a zero shot language transfer model on 5 NLP tasks covering a total of 39 languages from various language families: NLI, document classification, NER, POS tagging, and dependency parsing. Expand
Analytical Methods for Interpretable Ultradense Word Embeddings
TLDR
Three methods for making word spaces interpretable by rotation are investigated: Densifier (Rothe et al., 2016), linear SVMs and DensRay, a new method that can be computed in closed form, is hyperparameter-free and thus more robust than Densifiers. Expand
Unsupervised Cross-lingual Representation Learning at Scale
TLDR
It is shown that pretraining multilingual language models at scale leads to significant performance gains for a wide range of cross-lingual transfer tasks, and the possibility of multilingual modeling without sacrificing per-language performance is shown for the first time. Expand
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
TLDR
A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks. Expand
Transformers: State-of-the-Art Natural Language Processing
TLDR
Transformers is an open-source library that consists of carefully engineered state-of-the art Transformer architectures under a unified API and a curated collection of pretrained models made by and available for the community. Expand
Computational and Theoretical Analysis of Null Space and Orthogonal Linear Discriminant Analysis
TLDR
The main result shows that under a mild condition which holds in many applications involving high-dimensional data, NLDA is equivalent to OLDA, which confirms the effectiveness of the regularization in ROLDA. Expand
...
1
2
...