Problems With Evaluation of Word Embeddings Using Word Similarity Tasks

@article{Faruqui2016ProblemsWE,
  title={Problems With Evaluation of Word Embeddings Using Word Similarity Tasks},
  author={Manaal Faruqui and Yulia Tsvetkov and Pushpendre Rastogi and Chris Dyer},
  journal={ArXiv},
  year={2016},
  volume={abs/1605.02276}
}
Lacking standardized extrinsic evaluation methods for vector representations of words, the NLP community has relied heavily on word similarity tasks as a proxy for intrinsic evaluation of word vectors. Word similarity evaluation, which correlates the distance between vectors and human judgments of semantic similarity is attractive, because it is computationally inexpensive and fast. In this paper we present several problems associated with the evaluation of word vectors on word similarity… 

Tables from this paper

Improving Semantic Similarity of Words by Retrofitting Word Vectors in Sense Level

TLDR
This paper uses semantic relations as positive and negative examples to re-train the results of a pre-trained model instead of integrating them into the objective functions used during training to improve semantic distinction of words.

Dual embeddings and metrics for word and relational similarity

TLDR
This work shows how the best aspects of both approaches to word embeddings using different metrics can be captured and shows state-of-the-art performance on standard word and relational similarity benchmarks.

Retrofitting Word Representations for Unsupervised Sense Aware Word Similarities

TLDR
It is argued that minor senses play an important role in word similarity computations, hence an unsupervised sense inventory resource is used to retrofit monolingual word embeddings, producing sense-aware embeddins.

A Survey of Word Embeddings Evaluation Methods

TLDR
An extensive overview of the field of word embeddings evaluation is presented, highlighting main problems and proposing a typology of approaches to evaluation, summarizing 16 intrinsic methods and 12 extrinsic methods.

Geographical Evaluation of Word Embeddings

TLDR
This work proposes a novel principle which compares the information from word embeddings with reality and implements this principle by comparing the information in the word embedDings with geographical positions of cities.

SWOW-8500: Word Association task for Intrinsic Evaluation of Word Embeddings

TLDR
A novel intrinsic evaluation task employing large word association datasets (particularly the Small World of Words dataset) is proposed, and correlations not just between performances on SWOW-8500 and previously proposed intrinsic tasks of word similarity prediction, but also with downstream tasks (eg. Text Classification and Natural Language Inference).

Just Rank: Rethinking Evaluation with Word and Sentence Similarities

TLDR
This paper first points out the problems using semantic similarity as the gold standard for word and sentence embedding evaluations, and proposes a new intrinsic evaluation method called EvalRank, which shows a much stronger correlation with downstream tasks.

Intrinsic Word Embedding Model Evaluation for Lithuanian Language Using Adapted Similarity and Relatedness Benchmark Datasets

TLDR
Research on adaptation of the intrinsic similarity and relatedness task for the Lithuanian language and the evaluation of word embedding models, testing the quality of representations independently of specific natural language processing tasks suggest that the dimension parameter has a significant impact on the evaluation results.

Evaluating Word Embedding Hyper-Parameters for Similarity and Analogy Tasks

TLDR
Using standard embedding evaluation metrics and datasets, a study is conducted to empirically measure the impact of hyper-parameters such as vector dimensions and corpus size when training embedding model.

Improving Word Embeddings for Low Frequency Words by Pseudo Contexts

TLDR
It is found that the average similarities of low frequency words are always bigger than that of high frequency words, and when the frequency approaches to 400 around, the average similarity tends to stable.
...

References

SHOWING 1-10 OF 46 REFERENCES

Evaluation of Word Vector Representations by Subspace Alignment

TLDR
QVEC is presented—a computationally inexpensive intrinsic evaluation measure of the quality of word embeddings based on alignment to a matrix of features extracted from manually crafted lexical resources—that obtains strong correlation with performance of the vectors in a battery of downstream semantic evaluation tasks.

Community Evaluation and Exchange of Word Vectors at wordvectors.org

TLDR
This work presents a website and suite of offline tools that facilitate evaluation of word vectors on standard lexical semantics benchmarks and permit exchange and archival by users who wish to find good vectors for their applications.

Efficient Estimation of Word Representations in Vector Space

TLDR
Two novel model architectures for computing continuous vector representations of words from very large data sets are proposed and it is shown that these vectors provide state-of-the-art performance on the authors' test set for measuring syntactic and semantic word similarities.

Improving Vector Space Word Representations Using Multilingual Correlation

TLDR
This paper argues that lexico-semantic content should additionally be invariant across languages and proposes a simple technique based on canonical correlation analysis (CCA) for incorporating multilingual evidence into vectors generated monolingually.

Improving Word Representations via Global Context and Multiple Word Prototypes

TLDR
A new neural network architecture is presented which learns word embeddings that better capture the semantics of words by incorporating both local and global document context, and accounts for homonymy and polysemy by learning multiple embedDings per word.

Improving Distributional Similarity with Lessons Learned from Word Embeddings

TLDR
It is revealed that much of the performance gains of word embeddings are due to certain system design choices and hyperparameter optimizations, rather than the embedding algorithms themselves, and these modifications can be transferred to traditional distributional models, yielding similar gains.

Evaluation methods for unsupervised word embeddings

TLDR
A comprehensive study of evaluation methods for unsupervised embedding techniques that obtain meaningful representations of words from text, calling into question the common assumption that there is one single optimal vector representation.

Large-scale learning of word relatedness with constraints

TLDR
A large-scale data mining approach to learning word-word relatedness, where known pairs of related words impose constraints on the learning process, and learns for each word a low-dimensional representation, which strives to maximize the likelihood of a word given the contexts in which it appears.

Deep Multilingual Correlation for Improved Word Embeddings

TLDR
Deep non-linear transformations of word embeddings of the two languages are learned, using the recently proposed deep canonical correlation analysis, to improve their quality and consistency on multiple word and bigram similarity tasks.

A Study on Similarity and Relatedness Using Distributional and WordNet-based Approaches

TLDR
This paper presents and compares WordNet-based and distributional similarity approaches, and pioneer cross-lingual similarity, showing that the methods are easily adapted for a cross-lingsual task with minor losses.