• Corpus ID: 5490051

Random Walks on Context Spaces: Towards an Explanation of the Mysteries of Semantic Word Embeddings

  title={Random Walks on Context Spaces: Towards an Explanation of the Mysteries of Semantic Word Embeddings},
  author={Sanjeev Arora and Yuanzhi Li and Yingyu Liang and Tengyu Ma and Andrej Risteski},
The papers of Mikolov et al. 2013 as well as subsequent works have led to dramatic progress in solving word analogy tasks using semantic word embeddings. This leverages linear structure that is often found in the word embeddings, which is surprising since the training method is usually nonlinear. There were attempts ---notably by Levy and Goldberg and Pennington et al.--- to explain how this linear structure arises. The current paper points out the gaps in these explanations and provides a more… 

Figures and Tables from this paper

Word Embeddings as Metric Recovery in Semantic Spaces

A simple, principled, direct metric recovery algorithm is proposed that performs on par with the state-of-the-art word embedding and manifold learning methods and is complemented by constructing two new inductive reasoning datasets and demonstrating that word embeddings can be used to solve them.

WordRank: Learning Word Embeddings via Robust Ranking

This paper argues that word embedding can be naturally viewed as a ranking problem due to the ranking nature of the evaluation metrics, and proposes a novel framework WordRank that efficiently estimates word representations via robust ranking, in which the attention mechanism and robustness to noise are readily achieved via the DCG-like ranking losses.

Exponential Word Embeddings: Models and Approximate Learning

This thesis shows that a representation based on multiple vectors per word easily overcomes this limitation by having different vectors representing the different meanings of a word, which is especially beneficial when noisy and little training data is available.

Inside Out: Two Jointly Predictive Models for Word Representations and Phrase Representations

Two novel models to build better word representations by modeling both external contexts and internal morphemes in a jointly predictive way, called BEING and SEING are proposed and can outperform state-of-the-art models significantly on both word and phrase representation learning.

Dynamic Word Embeddings for Evolving Semantic Discovery

A dynamic statistical model is developed that simultaneously learns time-aware embeddings and solves the resulting alignment problem and consistently outperforms state-of-the-art temporal embedding approaches on both semantic accuracy and alignment quality.

Using Word Embeddings for Ontology Enrichment

This study investigates if the success of word2vec, a Neural Networks based word embeddings algorithm, can be replicated in an aggluginative language like Turkish, and proposes a simple but yet effective weakly supervised ontology enrichment algorithm.

Take and Took, Gaggle and Goose, Book and Read: Evaluating the Utility of Vector Differences for Lexical Relation Learning

It is found that word embeddings capture a surprising amount of information, and that, under suitable supervised training, vector subtraction generalises well to a broad range of relations, including over unseen lexical items.

Word, graph and manifold embedding from Markov processes

This paper generalizes metric recovery to graphs and manifolds, relating co-occurence counts on random walks in graphs and random processes on manifolds to the underlying metric to be recovered, thereby reconciling manifold estimation and embedding algorithms.

A Generative Word Embedding Model and its Low Rank Positive Semidefinite Solution

This work proposes a generative word embedding model, which is easy to interpret, and can serve as a basis of more sophisticated latent factor models, and is competitive to word2vec, and better than other MF-based methods.

Robust Gram Embeddings

This work proposes a regularized embedding formulation, called Robust Gram (RG), which penalizes overfitting by suppressing the disparity between target and context embeddings and shows that the RG model trained on small datasets generalizes better compared to alternatives, is more robust to variations in the training set, and correlates well to human similarities in a set of word similarity tasks.



word2vec Explained: deriving Mikolov et al.'s negative-sampling word-embedding method

This note is an attempt to explain equation (4) (negative sampling) in "Distributed Representations of Words and Phrases and their Compositionality" by Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado and Jeffrey Dean.

Learning word embeddings efficiently with noise-contrastive estimation

This work proposes a simple and scalable new approach to learning word embeddings based on training log-bilinear models with noise-contrastive estimation, and achieves results comparable to the best ones reported, using four times less data and more than an order of magnitude less computing time.

Linguistic Regularities in Continuous Space Word Representations

The vector-space word representations that are implicitly learned by the input-layer weights are found to be surprisingly good at capturing syntactic and semantic regularities in language, and that each relationship is characterized by a relation-specific vector offset.

GloVe: Global Vectors for Word Representation

A new global logbilinear regression model that combines the advantages of the two major model families in the literature: global matrix factorization and local context window methods and produces a vector space with meaningful substructure.

Neural Probabilistic Language Models

This work proposes to fight the curse of dimensionality by learning a distributed representation for words which allows each training sentence to inform the model about an exponential number of semantically neighboring sentences, and incorporates this new language model into a state-of-the-art speech recognizer of conversational speech.

Neural Word Embedding as Implicit Matrix Factorization

It is shown that using a sparse Shifted Positive PMI word-context matrix to represent words improves results on two word similarity tasks and one of two analogy tasks, and conjecture that this stems from the weighted nature of SGNS's factorization.

Linguistic Regularities in Sparse and Explicit Word Representations

It is demonstrated that analogy recovery is not restricted to neural word embeddings, and that a similar amount of relational similarities can be recovered from traditional distributional word representations.

Joint Learning of Words and Meaning Representations for Open-Text Semantic Parsing

This work proposes a method that learns to assign MRs to a wide range of text thanks to a training scheme that combines learning from knowledge bases with learning from raw text.

An Improved Model of Semantic Similarity Based on Lexical Co-Occurrence

A new vector-space method for deriving word-meanings from large corpora that was inspired by the HAL and LSA models, but which achieves better and more consistent results in predicting human similarity judgments is introduced.

A unified architecture for natural language processing: deep neural networks with multitask learning

We describe a single convolutional neural network architecture that, given a sentence, outputs a host of language processing predictions: part-of-speech tags, chunks, named entity tags, semantic