Corpus ID: 1190093

Neural Word Embedding as Implicit Matrix Factorization

@inproceedings{Levy2014NeuralWE,
  title={Neural Word Embedding as Implicit Matrix Factorization},
  author={Omer Levy and Yoav Goldberg},
  booktitle={NIPS},
  year={2014}
}
We analyze skip-gram with negative-sampling (SGNS), a word embedding method introduced by Mikolov et al., and show that it is implicitly factorizing a word-context matrix, whose cells are the pointwise mutual information (PMI) of the respective word and context pairs, shifted by a global constant. We find that another embedding method, NCE, is implicitly factorizing a similar matrix, where each cell is the (shifted) log conditional probability of a word given its context. We show that using a… Expand
Word Embedding Revisited: A New Representation Learning and Explicit Matrix Factorization Perspective
TLDR
It is pointed out that SGNS is essentially a representation learning method, which learns to represent the co-occurrence vector for a word, and that extended supervised word embedding can be established based on the proposed representation learning view. Expand
Fast PMI-Based Word Embedding with Efficient Use of Unobserved Patterns
TLDR
A new word embedding algorithm that works on a smoothed Positive Pointwise Mutual Information (PPMI) matrix which is obtained from the word-word co-occurrence counts and a kernel similarity measure for the latent space that can effectively calculate the similarities in high dimensions is proposed. Expand
Word Embeddings via Tensor Factorization
TLDR
It is shown that embeddings based on tensor factorization can be used to discern the various meanings of polysemous words without being explicitly trained to do so, and motivate the intuition behind why this works in a way that doesn't with existing methods. Expand
WordRank: Learning Word Embeddings via Robust Ranking
TLDR
This paper argues that word embedding can be naturally viewed as a ranking problem due to the ranking nature of the evaluation metrics, and proposes a novel framework WordRank that efficiently estimates word representations via robust ranking, in which the attention mechanism and robustness to noise are readily achieved via the DCG-like ranking losses. Expand
Exponential Family Word Embeddings: An Iterative Approach for Learning Word Vectors
GloVe and Skip-gram word embedding methods learn word vectors by decomposing a denoised matrix of word co-occurrences into a product of low-rank matrices. In this work, we propose an iterativeExpand
A Generative Word Embedding Model and its Low Rank Positive Semidefinite Solution
TLDR
This work proposes a generative word embedding model, which is easy to interpret, and can serve as a basis of more sophisticated latent factor models, and is competitive to word2vec, and better than other MF-based methods. Expand
Continuous Word Embedding Fusion via Spectral Decomposition
TLDR
This paper builds on the established view of word embeddings as matrix factorizations to present a spectral algorithm for this task, and demonstrates that the method is able to embed the new words efficiently into the original embedding space. Expand
Word Embedding With Zipf’s Context
TLDR
A simpler but efficient word embedding method based on cooccurrence matrix factorization according to Zipf’s word frequency law, which shows a comparable performance though it is much simpler than the neural language models. Expand
Spectral Word Embedding with Negative Sampling
TLDR
This work examines the notion of “negative examples”, the unobserved or insignificant word-context co-occurrences, in spectral methods and proposes a new formulation for the word embedding problem by proposing a new intuitive objective function that perfectly justifies the use of negative examples. Expand
Interpreting Word Embeddings with Eigenvector Analysis
Dense word vectors have proven their values in many downstream NLP tasks over the past few years. However, the dimensions of such embeddings are not easily interpretable. Out of the d-dimensions in aExpand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 35 REFERENCES
Linguistic Regularities in Sparse and Explicit Word Representations
TLDR
It is demonstrated that analogy recovery is not restricted to neural word embeddings, and that a similar amount of relational similarities can be recovered from traditional distributional word representations. Expand
Learning word embeddings efficiently with noise-contrastive estimation
TLDR
This work proposes a simple and scalable new approach to learning word embeddings based on training log-bilinear models with noise-contrastive estimation, and achieves results comparable to the best ones reported, using four times less data and more than an order of magnitude less computing time. Expand
Distributed Representations of Words and Phrases and their Compositionality
TLDR
This paper presents a simple method for finding phrases in text, and shows that learning good vector representations for millions of phrases is possible and describes a simple alternative to the hierarchical softmax called negative sampling. Expand
Linguistic Regularities in Continuous Space Word Representations
TLDR
The vector-space word representations that are implicitly learned by the input-layer weights are found to be surprisingly good at capturing syntactic and semantic regularities in language, and that each relationship is characterized by a relation-specific vector offset. Expand
Extracting semantic representations from word co-occurrence statistics: stop-lists, stemming, and SVD
TLDR
This article investigates the use of three further factors—namely, the application of stop-lists, word stemming, and dimensionality reduction using singular value decomposition (SVD)—that have been used to provide improved performance elsewhere and introduces an additional semantic task and explores the advantages of using a much larger corpus. Expand
word2vec Explained: deriving Mikolov et al.'s negative-sampling word-embedding method
TLDR
This note is an attempt to explain equation (4) (negative sampling) in "Distributed Representations of Words and Phrases and their Compositionality" by Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado and Jeffrey Dean. Expand
Word Representations: A Simple and General Method for Semi-Supervised Learning
TLDR
This work evaluates Brown clusters, Collobert and Weston (2008) embeddings, and HLBL (Mnih & Hinton, 2009) embeds of words on both NER and chunking, and finds that each of the three word representations improves the accuracy of these baselines. Expand
A Neural Probabilistic Language Model
TLDR
This work proposes to fight the curse of dimensionality by learning a distributed representation for words which allows each training sentence to inform the model about an exponential number of semantically neighboring sentences. Expand
Dependency-Based Word Embeddings
TLDR
The skip-gram model with negative sampling introduced by Mikolov et al. is generalized to include arbitrary contexts, and experiments with dependency-based contexts are performed, showing that they produce markedly different embeddings. Expand
Efficient Estimation of Word Representations in Vector Space
TLDR
Two novel model architectures for computing continuous vector representations of words from very large data sets are proposed and it is shown that these vectors provide state-of-the-art performance on the authors' test set for measuring syntactic and semantic word similarities. Expand
...
1
2
3
4
...