# Neural Word Embedding as Implicit Matrix Factorization

@inproceedings{Levy2014NeuralWE, title={Neural Word Embedding as Implicit Matrix Factorization}, author={Omer Levy and Yoav Goldberg}, booktitle={NIPS}, year={2014} }

We analyze skip-gram with negative-sampling (SGNS), a word embedding method introduced by Mikolov et al., and show that it is implicitly factorizing a word-context matrix, whose cells are the pointwise mutual information (PMI) of the respective word and context pairs, shifted by a global constant. We find that another embedding method, NCE, is implicitly factorizing a similar matrix, where each cell is the (shifted) log conditional probability of a word given its context. We show that using a… Expand

#### 1,490 Citations

Word Embedding Revisited: A New Representation Learning and Explicit Matrix Factorization Perspective

- Computer Science
- IJCAI
- 2015

It is pointed out that SGNS is essentially a representation learning method, which learns to represent the co-occurrence vector for a word, and that extended supervised word embedding can be established based on the proposed representation learning view. Expand

Fast PMI-Based Word Embedding with Efficient Use of Unobserved Patterns

- Computer Science
- AAAI
- 2019

A new word embedding algorithm that works on a smoothed Positive Pointwise Mutual Information (PPMI) matrix which is obtained from the word-word co-occurrence counts and a kernel similarity measure for the latent space that can effectively calculate the similarities in high dimensions is proposed. Expand

Word Embeddings via Tensor Factorization

- Mathematics, Computer Science
- ArXiv
- 2017

It is shown that embeddings based on tensor factorization can be used to discern the various meanings of polysemous words without being explicitly trained to do so, and motivate the intuition behind why this works in a way that doesn't with existing methods. Expand

WordRank: Learning Word Embeddings via Robust Ranking

- Computer Science, Mathematics
- EMNLP
- 2016

This paper argues that word embedding can be naturally viewed as a ranking problem due to the ranking nature of the evaluation metrics, and proposes a novel framework WordRank that efficiently estimates word representations via robust ranking, in which the attention mechanism and robustness to noise are readily achieved via the DCG-like ranking losses. Expand

Exponential Family Word Embeddings: An Iterative Approach for Learning Word Vectors

- Mathematics
- 2018

GloVe and Skip-gram word embedding methods learn word vectors by decomposing a denoised matrix of word co-occurrences into a product of low-rank matrices. In this work, we propose an iterative… Expand

A Generative Word Embedding Model and its Low Rank Positive Semidefinite Solution

- Computer Science, Mathematics
- EMNLP
- 2015

This work proposes a generative word embedding model, which is easy to interpret, and can serve as a basis of more sophisticated latent factor models, and is competitive to word2vec, and better than other MF-based methods. Expand

Continuous Word Embedding Fusion via Spectral Decomposition

- Computer Science
- CoNLL
- 2018

This paper builds on the established view of word embeddings as matrix factorizations to present a spectral algorithm for this task, and demonstrates that the method is able to embed the new words efficiently into the original embedding space. Expand

Word Embedding With Zipf’s Context

- Computer Science
- IEEE Access
- 2019

A simpler but efficient word embedding method based on cooccurrence matrix factorization according to Zipf’s word frequency law, which shows a comparable performance though it is much simpler than the neural language models. Expand

Spectral Word Embedding with Negative Sampling

- Computer Science
- AAAI
- 2018

This work examines the notion of “negative examples”, the unobserved or insignificant word-context co-occurrences, in spectral methods and proposes a new formulation for the word embedding problem by proposing a new intuitive objective function that perfectly justifies the use of negative examples. Expand

Interpreting Word Embeddings with Eigenvector Analysis

- Sociology
- 2018

Dense word vectors have proven their values in many downstream NLP tasks over the past few years. However, the dimensions of such embeddings are not easily interpretable. Out of the d-dimensions in a… Expand

#### References

SHOWING 1-10 OF 35 REFERENCES

Linguistic Regularities in Sparse and Explicit Word Representations

- Computer Science
- CoNLL
- 2014

It is demonstrated that analogy recovery is not restricted to neural word embeddings, and that a similar amount of relational similarities can be recovered from traditional distributional word representations. Expand

Learning word embeddings efficiently with noise-contrastive estimation

- Computer Science
- NIPS
- 2013

This work proposes a simple and scalable new approach to learning word embeddings based on training log-bilinear models with noise-contrastive estimation, and achieves results comparable to the best ones reported, using four times less data and more than an order of magnitude less computing time. Expand

Distributed Representations of Words and Phrases and their Compositionality

- Computer Science, Mathematics
- NIPS
- 2013

This paper presents a simple method for finding phrases in text, and shows that learning good vector representations for millions of phrases is possible and describes a simple alternative to the hierarchical softmax called negative sampling. Expand

Linguistic Regularities in Continuous Space Word Representations

- Computer Science
- NAACL
- 2013

The vector-space word representations that are implicitly learned by the input-layer weights are found to be surprisingly good at capturing syntactic and semantic regularities in language, and that each relationship is characterized by a relation-specific vector offset. Expand

Extracting semantic representations from word co-occurrence statistics: stop-lists, stemming, and SVD

- Computer Science, Medicine
- Behavior research methods
- 2012

This article investigates the use of three further factors—namely, the application of stop-lists, word stemming, and dimensionality reduction using singular value decomposition (SVD)—that have been used to provide improved performance elsewhere and introduces an additional semantic task and explores the advantages of using a much larger corpus. Expand

word2vec Explained: deriving Mikolov et al.'s negative-sampling word-embedding method

- Computer Science, Mathematics
- ArXiv
- 2014

This note is an attempt to explain equation (4) (negative sampling) in "Distributed Representations of Words and Phrases and their Compositionality" by Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado and Jeffrey Dean. Expand

Word Representations: A Simple and General Method for Semi-Supervised Learning

- Computer Science
- ACL
- 2010

This work evaluates Brown clusters, Collobert and Weston (2008) embeddings, and HLBL (Mnih & Hinton, 2009) embeds of words on both NER and chunking, and finds that each of the three word representations improves the accuracy of these baselines. Expand

A Neural Probabilistic Language Model

- Computer Science
- J. Mach. Learn. Res.
- 2000

This work proposes to fight the curse of dimensionality by learning a distributed representation for words which allows each training sentence to inform the model about an exponential number of semantically neighboring sentences. Expand

Dependency-Based Word Embeddings

- Computer Science
- ACL
- 2014

The skip-gram model with negative sampling introduced by Mikolov et al. is generalized to include arbitrary contexts, and experiments with dependency-based contexts are performed, showing that they produce markedly different embeddings. Expand

Efficient Estimation of Word Representations in Vector Space

- Computer Science
- ICLR
- 2013

Two novel model architectures for computing continuous vector representations of words from very large data sets are proposed and it is shown that these vectors provide state-of-the-art performance on the authors' test set for measuring syntactic and semantic word similarities. Expand