Cross-lingual Word Analogies using Linear Transformations between Semantic Spaces

@article{Brychcin2019CrosslingualWA,
  title={Cross-lingual Word Analogies using Linear Transformations between Semantic Spaces},
  author={Tomas Brychcin and Stephen Eugene Taylor and Luk{\'a}s Svoboda},
  journal={Expert Syst. Appl.},
  year={2019},
  volume={135},
  pages={287-295}
}

Figures and Tables from this paper

Understanding Linearity of Cross-Lingual Word Embedding Mappings
TLDR
This work presents a theoretical analysis that identifies the preservation of analogies encoded in monolingual word embeddings as a necessary and sufficient condition for the ground-truth CLWE mapping between those embedDings to be linear.
Developing a Cross-lingual Semantic Word Similarity Corpus for English–Urdu Language Pair
TLDR
The evaluation results of a baseline approach, namely “Translation Plus Monolingual Analysis” for automated identification of semantic similarity between English–Urdu word pairs, showed that the path length similarity measure performs better for the Google and Bing translated words.
Analyzing variation in translation through neural semantic spaces
TLDR
Using neural word embeddings (Word2Vec), the bilingual semantic spaces emanating from source-totranslation and source-to-interpreting are compared and the effects of summative linguistic effects of one vs. the other are explored at the lexical level.
Tracing variation in discourse connectives in translation and interpreting through neural semantic spaces
TLDR
This paper compares bilingual neural word embeddings trained on source-to-translation and source- to-interpreting aligned corpora and shows more variation of semantically related items in translation spaces vs. interpreting ones and a more consistent use of fewer connectives in interpreting.
Multilingual Culture-Independent Word Analogy Datasets
TLDR
This work designed the monolingual analogy task to be much more culturally independent and also constructed cross-lingual analogy datasets for the involved languages and presents basic statistics of the created datasets and their initial evaluation using fastText embeddings.
Multilingual Culture-Independent Word Analogy Datasets
TLDR
This work designed the monolingual analogy task to be much more culturally independent and also constructed cross-lingual analogy datasets for the involved languages and presents basic statistics of the created datasets and their initial evaluation using fastText embeddings.
Semantic Space Transformations for Cross-Lingual Document Classification
TLDR
Evaluation of three promising transform methods on cross-lingual document classification task and their results show that convolutional network achieves better results than maximum entropy classifier, and two approaches are proposed that are competitive with the state of the art.
UWB@RuShiftEval Measuring Semantic Difference as per-word Variation in Aligned Semantic Spaces
TLDR
This work presents their system for measuring the lexical semantic change between corpora pairs, developed for the RuShiftEval competition, and measures the similarity between the transformed vectors for each test word.
EMBEDDIA Cross-Lingual Embeddings for Less-Represented Languages in European News Media
TLDR
This report reviews the literature on biases of linguistic models, discusses biases in journalism, and describes technical notions of bias and makes recommendations for detecting and avoiding biases in the context of news.
...
...

References

SHOWING 1-10 OF 42 REFERENCES
Cross-lingual Dependency Parsing Based on Distributed Representations
TLDR
This paper provides two algorithms for inducing cross-lingual distributed representations of words, which map vocabularies from two different languages into a common vector space and bridges the lexical feature gap by using distributed feature representations and their composition.
A Framework for the Construction of Monolingual and Cross-lingual Word Similarity Datasets
TLDR
This work proposes an automatic standardization for the construction of cross-lingual similarity datasets, and provides an evaluation, demonstrating its reliability and robustness.
SemEval-2017 Task 2: Multilingual and Cross-lingual Semantic Word Similarity
TLDR
Results show that systems that combine statistical knowledge from text corpora, in the form of word embeddings, and external knowledge from lexical resources are best performers in both subtasks.
Cross-lingual Models of Word Embeddings: An Empirical Comparison
TLDR
It is shown that models which require expensive cross-lingual knowledge almost always perform better, but cheaply supervised models often prove competitive on certain tasks.
A Strong Baseline for Learning Cross-Lingual Word Embeddings from Sentence Alignments
TLDR
It is suggested that adding additional sources of information, which go beyond the traditional signal of bilingual sentence-aligned corpora, may substantially improve cross-lingual word embeddings, and that future baselines should at least take such features into account.
Bilingual Word Embeddings from Parallel and Non-parallel Corpora for Cross-Language Text Classification
TLDR
Bilingual paRAgraph VEctors is introduced, a model to learn bilingual distributed representations of words without word alignments either from sentencealigned parallel or label-aligned non-parallel document corpora to support cross-language text classification.
On the Role of Seed Lexicons in Learning Bilingual Word Embeddings
TLDR
Effectively, it is demonstrated that a SBWES may be induced by leveraging only a very weak bilingual signal (document alignments) along with monolingual data.
Bilingual Distributed Word Representations from Document-Aligned Comparable Data
TLDR
It is revealed that BWEs may be learned solely on the basis of document-aligned comparable data without any additional lexical resources nor syntactic information.
Improving Vector Space Word Representations Using Multilingual Correlation
TLDR
This paper argues that lexico-semantic content should additionally be invariant across languages and proposes a simple technique based on canonical correlation analysis (CCA) for incorporating multilingual evidence into vectors generated monolingually.
Exploiting Similarities among Languages for Machine Translation
TLDR
This method can translate missing word and phrase entries by learning language structures based on large monolingual data and mapping between languages from small bilingual data and uses distributed representation of words and learns a linear mapping between vector spaces of languages.
...
...