• Corpus ID: 236133964

Cross-Lingual BERT Contextual Embedding Space Mapping with Isotropic and Isometric Conditions

  title={Cross-Lingual BERT Contextual Embedding Space Mapping with Isotropic and Isometric Conditions},
  author={Haoran Xu and Philipp Koehn},
Typically, a linearly orthogonal transformation mapping is learned by aligning static typelevel embeddings to build a shared semantic space. In view of the analysis that contextual embeddings contain richer semantic features, we investigate a context-aware and dictionaryfree mapping approach by leveraging parallel corpora. We illustrate that our contextual embedding space mapping significantly outperforms previous multilingual word embedding methods on the bilingual dictionary induction (BDI… 

Figures and Tables from this paper


Cross-Lingual Alignment of Contextual Word Embeddings, with Applications to Zero-shot Dependency Parsing
A novel method for multilingual transfer that utilizes deep contextual embeddings, pretrained in an unsupervised fashion, that consistently outperforms the previous state-of-the-art on 6 tested languages, yielding an improvement of 6.8 LAS points on average.
Are Girls Neko or Shōjo? Cross-Lingual Alignment of Non-Isomorphic Embeddings with Iterative Normalization
Iterative Normalization consistently improves word translation accuracy of three CLWE methods, with the largest improvement observed on English-Japanese (from 2% to 44% test accuracy).
Learning principled bilingual mappings of word embeddings while preserving monolingual invariance
This paper proposes a framework that generalizes previous work, provides an efficient exact method to learn the optimal linear transformation and yields the best bilingual results in translation induction while preserving monolingual performance in an analogy task.
On the Limitations of Unsupervised Bilingual Dictionary Induction
It is shown that a simple trick, exploiting a weak supervision signal from identical words, enables more robust induction and establishes a near-perfect correlation between unsupervised bilingual dictionary induction performance and a previously unexplored graph similarity metric.
A Survey of Cross-lingual Word Embedding Models
A comprehensive typology of cross-lingual word embedding models is provided, showing that many of the models presented in the literature optimize for the same objectives, and that seemingly different models are often equivalent modulo optimization strategies, hyper-parameters, and such.
Adversarial Training for Unsupervised Bilingual Lexicon Induction
This work shows that cross-lingual connection can actually be established without any form of supervision, by formulating the problem as a natural adversarial game, and investigating techniques that are crucial to successful training.
Bilingual Word Representations with Monolingual Quality in Mind
This work proposes a joint model to learn word representations from scratch that utilizes both the context coocurrence information through the monolingual component and the meaning equivalent signals from the bilingual constraint to learn high quality bilingual representations efficiently.
Word Translation Without Parallel Data
It is shown that a bilingual dictionary can be built between two languages without using any parallel corpora, by aligning monolingual word embedding spaces in an unsupervised way.
BilBOWA: Fast Bilingual Distributed Representations without Word Alignments
It is shown that bilingual embeddings learned using the proposed BilBOWA model outperform state-of-the-art methods on a cross-lingual document classification task as well as a lexical translation task on WMT11 data.
How Contextual are Contextualized Word Representations? Comparing the Geometry of BERT, ELMo, and GPT-2 Embeddings
It is found that in all layers of ELMo, BERT, and GPT-2, on average, less than 5% of the variance in a word’s contextualized representations can be explained by a static embedding for that word, providing some justification for the success of contextualization representations.