• Corpus ID: 16447573

Distributed Representations of Words and Phrases and their Compositionality

@inproceedings{Mikolov2013DistributedRO,
  title={Distributed Representations of Words and Phrases and their Compositionality},
  author={Tomas Mikolov and Ilya Sutskever and Kai Chen and Gregory S. Corrado and Jeffrey Dean},
  booktitle={NIPS},
  year={2013}
}
The recently introduced continuous Skip-gram model is an efficient method for learning high-quality distributed vector representations that capture a large number of precise syntactic and semantic word relationships. In this paper we present several extensions that improve both the quality of the vectors and the training speed. By subsampling of the frequent words we obtain significant speedup and also learn more regular word representations. We also describe a simple alternative to the… 

Figures and Tables from this paper

"The Sum of Its Parts": Joint Learning of Word and Phrase Representations with Autoencoders
TLDR
A novel model that jointly learns word vector representations and their summation is introduced, and it is shown that these representations give better performance on the phrase evaluation.
A Hierarchical Playscript Representation of Distributed Words for Effective Semantic Clustering and Search
TLDR
This work analyzes the clustering of the complete play set of Shakespeare by exploring multidimensional scaling for visualization, and experimented with playscript searches of both contiguous and out-of-order parts of dialogues to report robust results that support the intuition for measuring play-to-play and dialogue- to-play similarity.
Rehabilitation of Count-Based Models for Word Vector Representations
TLDR
A systematic study of the use of the Hellinger distance to extract semantic representations from the word co-occurrence statistics of large text corpora shows that this distance gives good performance on word similarity and analogy tasks, with a proper type and size of context, and a dimensionality reduction based on a stochastic low-rank approximation.
Towards Learning Word Representation
TLDR
The underlying idea of the technique is to present a word in form of a bag of syllable and letter n-grams, and provide a vector representation for each extracted syllable-based and letter-based n- gram, and perform concatenation.
Enriching Word Vectors with Subword Information
TLDR
A new approach based on the skipgram model, where each word is represented as a bag of character n-grams, with words being represented as the sum of these representations, which achieves state-of-the-art performance on word similarity and analogy tasks.
GloVe: Global Vectors for Word Representation
TLDR
A new global logbilinear regression model that combines the advantages of the two major model families in the literature: global matrix factorization and local context window methods and produces a vector space with meaningful substructure.
Supervised and unsupervised methods for learning representations of linguistic units
TLDR
A supervised, graph-based method to create word representations is investigated and a new calculus for the interpretable ultradense subspaces, including polarity, concreteness, frequency and part-of-speech (POS) is introduced.
Improved Word Embeddings with Implicit Structure Information
TLDR
This work introduces an extension to the continuous bag-of-words model for learning word representations efficiently by using implicit structure information, and compute weights representing probabilities of syntactic relations based on the Huffman softmax tree in an efficient heuristic.
Distributed Representations of Sentences and Documents
TLDR
Paragraph Vector is an unsupervised algorithm that learns fixed-length feature representations from variable-length pieces of texts, such as sentences, paragraphs, and documents, and its construction gives the algorithm the potential to overcome the weaknesses of bag-of-words models.
A New Method for the Construction of Evolving Embedded Representations of Words
TLDR
A new perspective to determine the semantic similarity of words over time is provided by constructing several representations of words that reflect the semantic change of words based on latest embeddings.
...
...

References

SHOWING 1-10 OF 32 REFERENCES
Semantic Compositionality through Recursive Matrix-Vector Spaces
TLDR
A recursive neural network model that learns compositional vector representations for phrases and sentences of arbitrary syntactic type and length and can learn the meaning of operators in propositional logic and natural language is introduced.
Linguistic Regularities in Continuous Space Word Representations
TLDR
The vector-space word representations that are implicitly learned by the input-layer weights are found to be surprisingly good at capturing syntactic and semantic regularities in language, and that each relationship is characterized by a relation-specific vector offset.
Efficient Estimation of Word Representations in Vector Space
TLDR
Two novel model architectures for computing continuous vector representations of words from very large data sets are proposed and it is shown that these vectors provide state-of-the-art performance on the authors' test set for measuring syntactic and semantic word similarities.
A Neural Probabilistic Language Model
TLDR
This work proposes to fight the curse of dimensionality by learning a distributed representation for words which allows each training sentence to inform the model about an exponential number of semantically neighboring sentences.
A fast and simple algorithm for training neural probabilistic language models
TLDR
This work proposes a fast and simple algorithm for training NPLMs based on noise-contrastive estimation, a newly introduced procedure for estimating unnormalized continuous distributions and demonstrates the scalability of the proposed approach by training several neural language models on a 47M-word corpus with a 80K-word vocabulary.
Distributional Semantics Beyond Words: Supervised Learning of Analogy and Paraphrase
TLDR
The main contribution of this paper is that combination functions are generated by supervised learning and achieve state-of-the-art results in measuring relational similarity between word pairs and measuring compositional similarity between noun-modifier phrases and unigrams.
Hierarchical Probabilistic Neural Network Language Model
TLDR
A hierarchical decomposition of the conditional probabilities that yields a speed-up of about 200 both during training and recognition, constrained by the prior knowledge extracted from the WordNet semantic hierarchy is introduced.
Word Representations: A Simple and General Method for Semi-Supervised Learning
TLDR
This work evaluates Brown clusters, Collobert and Weston (2008) embeddings, and HLBL (Mnih & Hinton, 2009) embeds of words on both NER and chunking, and finds that each of the three word representations improves the accuracy of these baselines.
A Scalable Hierarchical Distributed Language Model
TLDR
A fast hierarchical language model along with a simple feature-based algorithm for automatic construction of word trees from the data are introduced and it is shown that the resulting models can outperform non-hierarchical neural models as well as the best n-gram models.
Continuous space language models
...
...