Spectral Word Embedding with Negative Sampling

  title={Spectral Word Embedding with Negative Sampling},
  author={Behrouz Haji Soleimani and Stan Matwin},
In this work, we investigate word embedding algorithms in the context of natural language processing. In particular, we examine the notion of ``negative examples'', the unobserved or insignificant word-context co-occurrences, in spectral methods. we provide a new formulation for the word embedding problem by proposing a new intuitive objective function that perfectly justifies the use of negative examples. In fact, our algorithm not only learns from the important word-context co-occurrences… 

Figures and Tables from this paper

Fast PMI-Based Word Embedding with Efficient Use of Unobserved Patterns
A new word embedding algorithm that works on a smoothed Positive Pointwise Mutual Information (PPMI) matrix which is obtained from the word-word co-occurrence counts and a kernel similarity measure for the latent space that can effectively calculate the similarities in high dimensions is proposed.
On SkipGram Word Embedding Models with Negative Sampling: Unified Framework and Impact of Noise Distributions
A framework for word embedding, referred to as Word-Context Classification (WCC), is formulated that generalizes SGN to a wide family of models, and several novel embedding models are discovered that outperform the existing WCC models.
Efficient Unsupervised Word Sense Induction , Disambiguation and Embedding
An efficient word sense disambiguation and embedding algorithm that learns multi-prototype sense vectors to accommodate different meanings of words and achieves the state-of-the-art accuracy in a more efficient way.
Learning Word Embeddings without Context Vectors
This work suggests using indefinite inner product in skip-gram negative sampling algorithm, which allows for only one sort of vectors in word embedding algorithms, and performs on par with SGNS on word similarity datasets.
Topic discovery by spectral decomposition and clustering with coordinated global and local contexts
A novel coordinated embedding topic model (CETM), which incorporates spectral decomposition and clustering technique by leveraging both global and local context information to discover topics and achieves significantly better performance in terms of topic coherence and text classification.
A survey on deep learning for textual emotion analysis in social networks
DeepMiner at SemEval-2018 Task 1: Emotion Intensity Recognition Using Deep Representation Learning
A multi-aspect feature learning mechanism to capture the most discriminative semantic features of a tweet as well as the emotion information conveyed by each word in it and achieves a Pearson correlation of 72% on the task of tweet emotion intensity prediction.
Question Answering in Knowledge Bases
A novel model architecture, APVA, which takes advantage of KB-based information to improve relation prediction but verifies the correctness of the predicted relation by means of simple negative sampling in a logistic regression framework is presented.
Learning Embeddings for Text and Images from Structure of the Data
The author states that the author did not intend for the book to be taken as gospel, but that it was meant to be used as a guide to further studies.


Neural Word Embedding as Implicit Matrix Factorization
It is shown that using a sparse Shifted Positive PMI word-context matrix to represent words improves results on two word similarity tasks and one of two analogy tasks, and conjecture that this stems from the weighted nature of SGNS's factorization.
GloVe: Global Vectors for Word Representation
A new global logbilinear regression model that combines the advantages of the two major model families in the literature: global matrix factorization and local context window methods and produces a vector space with meaningful substructure.
Distributed Representations of Sentences and Documents
Paragraph Vector is an unsupervised algorithm that learns fixed-length feature representations from variable-length pieces of texts, such as sentences, paragraphs, and documents, and its construction gives the algorithm the potential to overcome the weaknesses of bag-of-words models.
Improving Distributional Similarity with Lessons Learned from Word Embeddings
It is revealed that much of the performance gains of word embeddings are due to certain system design choices and hyperparameter optimizations, rather than the embedding algorithms themselves, and these modifications can be transferred to traditional distributional models, yielding similar gains.
Distributed Representations of Words and Phrases and their Compositionality
This paper presents a simple method for finding phrases in text, and shows that learning good vector representations for millions of phrases is possible and describes a simple alternative to the hierarchical softmax called negative sampling.
Neural Probabilistic Language Models
This work proposes to fight the curse of dimensionality by learning a distributed representation for words which allows each training sentence to inform the model about an exponential number of semantically neighboring sentences, and incorporates this new language model into a state-of-the-art speech recognizer of conversational speech.
A Neural Probabilistic Language Model
This work proposes to fight the curse of dimensionality by learning a distributed representation for words which allows each training sentence to inform the model about an exponential number of semantically neighboring sentences.
Efficient Estimation of Word Representations in Vector Space
Two novel model architectures for computing continuous vector representations of words from very large data sets are proposed and it is shown that these vectors provide state-of-the-art performance on the authors' test set for measuring syntactic and semantic word similarities.
Word Association Norms, Mutual Information and Lexicography
The proposed measure, the association ratio, estimates word association norms directly from computer readable corpora, making it possible to estimate norms for tens of thousands of words.
Class-Based n-gram Models of Natural Language
This work addresses the problem of predicting a word from previous words in a sample of text and discusses n-gram models based on classes of words, finding that these models are able to extract classes that have the flavor of either syntactically based groupings or semanticallybased groupings, depending on the nature of the underlying statistics.