GloVe: Global Vectors for Word Representation

  title={GloVe: Global Vectors for Word Representation},
  author={Jeffrey Pennington and Richard Socher and Christopher D. Manning},
  booktitle={Conference on Empirical Methods in Natural Language Processing},
Recent methods for learning vector space representations of words have succeeded in capturing fine-grained semantic and syntactic regularities using vector arithmetic, but the origin of these regularities has remained opaque. [] Key Method Our model efficiently leverages statistical information by training only on the nonzero elements in a word-word cooccurrence matrix, rather than on the entire sparse matrix or on individual context windows in a large corpus. The model produces a vector space with…

Figures and Tables from this paper

Rehabilitation of Count-Based Models for Word Vector Representations

A systematic study of the use of the Hellinger distance to extract semantic representations from the word co-occurrence statistics of large text corpora shows that this distance gives good performance on word similarity and analogy tasks, with a proper type and size of context, and a dimensionality reduction based on a stochastic low-rank approximation.

Modeling Semantic Relatedness using Global Relation Vectors

A novel method which directly learns relation vectors from co-occurrence statistics is introduced, and it is shown how relation vectors can be naturally embedded into the resulting vector space.

Measuring Enrichment Of Word Embeddings With Subword And Dictionary Information

Results show that fine-tuning the vectors with semantic information dramatically improves performance inword similarity; conversely, enriching word vectors with subword information increases performance in word analogy tasks, with the hybrid approach finding a solid middle ground.

Modeling Context Words as Regions: An Ordinal Regression Approach to Word Embedding

The underlying ranking interpretation of word contexts is sufficient to match, and sometimes outperform, the performance of popular methods such as Skip-gram, and by using a quadratic kernel, the model can effectively learn word regions, which outperform existing unsupervised models for the task of hypernym detection.

Analyzing Structures in the Semantic Vector Space: A Framework for Decomposing Word Embeddings

A framework for decomposing word embeddings into smaller meaningful units which are called sub-vectors is presented, which opens up a wide range of possibilities analyzing phenomena in vector space semantics, as well as solving concrete NLP problems.

Word2Box: Learning Word Representation Using Box Embeddings

This model takes a region-based approach to the problem of word representation, representing words as n-dimensional rectangles, and provides additional geometric operations such as intersection and containment which allow them to model co-occurrence patterns vectors struggle with.

PAWE: Polysemy Aware Word Embeddings

This work develops a new word embedding model that can accurately represent such words by automatically learning multiple representations for each word, whilst remaining computationally efficient.

Fast PMI-Based Word Embedding with Efficient Use of Unobserved Patterns

A new word embedding algorithm that works on a smoothed Positive Pointwise Mutual Information (PPMI) matrix which is obtained from the word-word co-occurrence counts and a kernel similarity measure for the latent space that can effectively calculate the similarities in high dimensions is proposed.

Distributed Representation of Words in Vector Space for Kannada Language

A distributed representation for Kannada words is proposed using an optimal neural network model and combining various known techniques to improve the vector space representation.

Learning Word Vectors with Linear Constraints: A Matrix Factorization Approach

Two new embedding models based on the singular value decomposition of lexical co-occurrences of words are proposed, which allow for injecting linear constraints when performing the decomposition, with which the desired semantic and syntactic information will be maintained in word vectors.



Linguistic Regularities in Continuous Space Word Representations

The vector-space word representations that are implicitly learned by the input-layer weights are found to be surprisingly good at capturing syntactic and semantic regularities in language, and that each relationship is characterized by a relation-specific vector offset.

Linguistic Regularities in Sparse and Explicit Word Representations

It is demonstrated that analogy recovery is not restricted to neural word embeddings, and that a similar amount of relational similarities can be recovered from traditional distributional word representations.

Better Word Representations with Recursive Neural Networks for Morphology

This paper combines recursive neural networks, where each morpheme is a basic unit, with neural language models to consider contextual information in learning morphologicallyaware word representations and proposes a novel model capable of building representations for morphologically complex words from their morphemes.

Efficient Estimation of Word Representations in Vector Space

Two novel model architectures for computing continuous vector representations of words from very large data sets are proposed and it is shown that these vectors provide state-of-the-art performance on the authors' test set for measuring syntactic and semantic word similarities.

Improving Word Representations via Global Context and Multiple Word Prototypes

A new neural network architecture is presented which learns word embeddings that better capture the semantics of words by incorporating both local and global document context, and accounts for homonymy and polysemy by learning multiple embedDings per word.

Learning word embeddings efficiently with noise-contrastive estimation

This work proposes a simple and scalable new approach to learning word embeddings based on training log-bilinear models with noise-contrastive estimation, and achieves results comparable to the best ones reported, using four times less data and more than an order of magnitude less computing time.

Word Representations: A Simple and General Method for Semi-Supervised Learning

This work evaluates Brown clusters, Collobert and Weston (2008) embeddings, and HLBL (Mnih & Hinton, 2009) embeds of words on both NER and chunking, and finds that each of the three word representations improves the accuracy of these baselines.

A Neural Probabilistic Language Model

This work proposes to fight the curse of dimensionality by learning a distributed representation for words which allows each training sentence to inform the model about an exponential number of semantically neighboring sentences.

Word Embeddings through Hellinger PCA

This work proposes to drastically simplify the word embeddings computation through a Hellinger PCA of the word co- occurence matrix and shows that it can provide an easy way to adaptembeddings to specific tasks.

An Improved Model of Semantic Similarity Based on Lexical Co-Occurrence

A new vector-space method for deriving word-meanings from large corpora that was inspired by the HAL and LSA models, but which achieves better and more consistent results in predicting human similarity judgments is introduced.