Corpus ID: 236133964

Cross-Lingual BERT Contextual Embedding Space Mapping with Isotropic and Isometric Conditions

@article{Xu2021CrossLingualBC,
  title={Cross-Lingual BERT Contextual Embedding Space Mapping with Isotropic and Isometric Conditions},
  author={Haoran Xu and Philipp Koehn},
  journal={ArXiv},
  year={2021},
  volume={abs/2107.09186}
}
Typically, a linearly orthogonal transformation mapping is learned by aligning static typelevel embeddings to build a shared semantic space. In view of the analysis that contextual embeddings contain richer semantic features, we investigate a context-aware and dictionaryfree mapping approach by leveraging parallel corpora. We illustrate that our contextual embedding space mapping significantly outperforms previous multilingual word embedding methods on the bilingual dictionary induction (BDI… Expand

Figures and Tables from this paper

References

SHOWING 1-10 OF 30 REFERENCES
Cross-Lingual Alignment of Contextual Word Embeddings, with Applications to Zero-shot Dependency Parsing
TLDR
A novel method for multilingual transfer that utilizes deep contextual embeddings, pretrained in an unsupervised fashion, that consistently outperforms the previous state-of-the-art on 6 tested languages, yielding an improvement of 6.8 LAS points on average. Expand
Are Girls Neko or Shōjo? Cross-Lingual Alignment of Non-Isomorphic Embeddings with Iterative Normalization
TLDR
Iterative Normalization consistently improves word translation accuracy of three CLWE methods, with the largest improvement observed on English-Japanese (from 2% to 44% test accuracy). Expand
Learning principled bilingual mappings of word embeddings while preserving monolingual invariance
TLDR
This paper proposes a framework that generalizes previous work, provides an efficient exact method to learn the optimal linear transformation and yields the best bilingual results in translation induction while preserving monolingual performance in an analogy task. Expand
On the Limitations of Unsupervised Bilingual Dictionary Induction
TLDR
It is shown that a simple trick, exploiting a weak supervision signal from identical words, enables more robust induction and establishes a near-perfect correlation between unsupervised bilingual dictionary induction performance and a previously unexplored graph similarity metric. Expand
A Survey of Cross-lingual Word Embedding Models
TLDR
A comprehensive typology of cross-lingual word embedding models is provided, showing that many of the models presented in the literature optimize for the same objectives, and that seemingly different models are often equivalent modulo optimization strategies, hyper-parameters, and such. Expand
Adversarial Training for Unsupervised Bilingual Lexicon Induction
TLDR
This work shows that cross-lingual connection can actually be established without any form of supervision, by formulating the problem as a natural adversarial game, and investigating techniques that are crucial to successful training. Expand
Bilingual Word Representations with Monolingual Quality in Mind
TLDR
This work proposes a joint model to learn word representations from scratch that utilizes both the context coocurrence information through the monolingual component and the meaning equivalent signals from the bilingual constraint to learn high quality bilingual representations efficiently. Expand
Word Translation Without Parallel Data
TLDR
It is shown that a bilingual dictionary can be built between two languages without using any parallel corpora, by aligning monolingual word embedding spaces in an unsupervised way. Expand
BilBOWA: Fast Bilingual Distributed Representations without Word Alignments
TLDR
It is shown that bilingual embeddings learned using the proposed BilBOWA model outperform state-of-the-art methods on a cross-lingual document classification task as well as a lexical translation task on WMT11 data. Expand
How Contextual are Contextualized Word Representations? Comparing the Geometry of BERT, ELMo, and GPT-2 Embeddings
TLDR
It is found that in all layers of ELMo, BERT, and GPT-2, on average, less than 5% of the variance in a word’s contextualized representations can be explained by a static embedding for that word, providing some justification for the success of contextualization representations. Expand
...
1
2
3
...