Intrinsic Evaluations of Word Embeddings: What Can We Do Better?

  title={Intrinsic Evaluations of Word Embeddings: What Can We Do Better?},
  author={Anna Rogers and Aleksandr Drozd},
This paper presents an analysis of existing methods for the intrinsic evaluation of word embeddings. We show that the main methodological premise of such evaluations is “interpretability” of word embeddings: a “good” embedding produces results that make sense in terms of traditional linguistic categories. This approach is not only of limited practical use, but also fails to do justice to the strengths of distributional meaning representations. We argue for a shift from abstract ratings of word… 

Figures and Tables from this paper

A Survey of Word Embeddings Evaluation Methods

An extensive overview of the field of word embeddings evaluation is presented, highlighting main problems and proposing a typology of approaches to evaluation, summarizing 16 intrinsic methods and 12 extrinsic methods.

What’s in Your Embedding, And How It Predicts Task Performance

This work presents a new approach based on scaled-up qualitative analysis of word vector neighborhoods that quantifies interpretable characteristics of a given model that enables multi-faceted evaluation, parameter search, and generally – a more principled, hypothesis-driven approach to development of distributional semantic representations.

Geographical Evaluation of Word Embeddings

This work proposes a novel principle which compares the information from word embeddings with reality and implements this principle by comparing the information in the word embedDings with geographical positions of cities.

Evaluating Word Embedding Hyper-Parameters for Similarity and Analogy Tasks

Using standard embedding evaluation metrics and datasets, a study is conducted to empirically measure the impact of hyper-parameters such as vector dimensions and corpus size when training embedding model.

Analyzing the Surprising Variability in Word Embedding Stability Across Languages

Linguistic properties that are related to stability are discussed, drawing out insights about correlations with affixing, language gender systems, and other features that have implications for embedding use and research that uses them to study language trends.

Investigating Different Syntactic Context Types and Context Representations for Learning Word Embeddings

This paper provides a systematical investigation of 4 different syntactic context types and context representations for learning word embeddings and hopes it would be helpful for choosing the best context type and representation for a given task.

Evaluating word embedding models: methods and experimental results

Extensive evaluation on a large number of word embedding models for language processing applications is conducted and it is shown that different evaluators focus on different aspects of word models, and some are more correlated with natural language processing tasks.

Eliciting Explicit Knowledge From Domain Experts in Direct Intrinsic Evaluation of Word Embeddings for Specialized Domains

It is found that inter-rater agreement rates are similar to those of more conventional semantic annotation tasks, suggesting that these tasks can be used to evaluate word embeddings of text types for which implicit knowledge may not suffice.

Theoretical foundations and limits of word embeddings: what types of meaning can they capture?

The ways in which word embeddings model three core premises of a structural linguistic theory of meaning: that meaning is relational, coherent, and may be analyzed as a static system are theorized.

From static to dynamic word representations: a survey

This survey provides a comprehensive typology of word representation models from a novel perspective that the development from static to dynamic embeddings can effectively address the polysemy problem.



Evaluation methods for unsupervised word embeddings

A comprehensive study of evaluation methods for unsupervised embedding techniques that obtain meaningful representations of words from text, calling into question the common assumption that there is one single optimal vector representation.

How to Generate a Good Word Embedding

The authors analyze three critical components in training word embeddings: model, corpus, and training parameters, and evaluate each word embedding in three ways: analyzing its semantic properties, using it as a feature for supervised tasks, and using it to initialize neural networks.

Improving Distributional Similarity with Lessons Learned from Word Embeddings

It is revealed that much of the performance gains of word embeddings are due to certain system design choices and hyperparameter optimizations, rather than the embedding algorithms themselves, and these modifications can be transferred to traditional distributional models, yielding similar gains.

Analogy-based detection of morphological and semantic relations with word embeddings: what works and what doesn’t.

This study applies the widely used vector offset method to 4 types of linguistic relations: inflectional and derivational morphology, and lexicographic and encyclopedic semantics, and systematically examines how accuracy for different categories is affected by window size and dimensionality of the SVD-based word embeddings.

Evaluation of Word Vector Representations by Subspace Alignment

QVEC is presented—a computationally inexpensive intrinsic evaluation measure of the quality of word embeddings based on alignment to a matrix of features extracted from manually crafted lexical resources—that obtains strong correlation with performance of the vectors in a battery of downstream semantic evaluation tasks.

Controlled Experiments for Word Embeddings

An experimental approach to studying the properties of word embeddings is proposed. Controlled experiments, achieved through modifications of the training corpus, permit the demonstration of direct

The Role of Context Types and Dimensionality in Learning Word Embeddings

We provide the first extensive evaluation of how using different types of context to learn skip-gram word embeddings affects performance on a wide range of intrinsic and extrinsic NLP tasks. Our

Specializing Word Embeddings for Similarity or Relatedness

It is shown that using specialized spaces in NLP tasks and applications leads to clear improvements, for document classification and synonym selection, which rely on either similarity or relatedness but not both.

How we BLESSed distributional semantic evaluation

BLESS contains a set of tuples instantiating different, explicitly typed semantic relations, plus a number of controlled random tuples, making it possible to assess the ability of a model to detect truly related word pairs, as well as to perform in-depth analyses of the types of semantic relations that a model favors.

Efficient Non-parametric Estimation of Multiple Embeddings per Word in Vector Space

An extension to the Skip-gram model that efficiently learns multiple embeddings per word type is presented, and its scalability is demonstrated by training with one machine on a corpus of nearly 1 billion tokens in less than 6 hours.