• Corpus ID: 5959482

Efficient Estimation of Word Representations in Vector Space

  title={Efficient Estimation of Word Representations in Vector Space},
  author={Tomas Mikolov and Kai Chen and Gregory S. Corrado and Jeffrey Dean},
  booktitle={International Conference on Learning Representations},
We propose two novel model architectures for computing continuous vector representations of words from very large data sets. [] Key Result Furthermore, we show that these vectors provide state-of-the-art performance on our test set for measuring syntactic and semantic word similarities.

Figures and Tables from this paper

Rehabilitation of Count-Based Models for Word Vector Representations

A systematic study of the use of the Hellinger distance to extract semantic representations from the word co-occurrence statistics of large text corpora shows that this distance gives good performance on word similarity and analogy tasks, with a proper type and size of context, and a dimensionality reduction based on a stochastic low-rank approximation.

Learning Word Vectors for 157 Languages

This paper describes how high quality word representations for 157 languages were trained on the free online encyclopedia Wikipedia and data from the common crawl project, and introduces three new word analogy datasets to evaluate these word vectors.

Enriching Word Vectors with Subword Information

A new approach based on the skipgram model, where each word is represented as a bag of character n-grams, with words being represented as the sum of these representations, which achieves state-of-the-art performance on word similarity and analogy tasks.

Improving Word Representations with Document Labels

This paper proposes to incorporate document labels into the learning process of word representations in two frameworks: neural network and matrix factorization and shows that these models can better capture the semantic and syntactic information than the original models.

Neural Vector Conceptualization for Word Vector Space Interpretation

This work introduces a new method to interpret arbitrary samples from a word vector space using a neural model to conceptualize word vectors, which means that it activates higher order concepts it recognizes in a given vector.

Can word vectors help corpus linguists?

To what extent state-of-the-art word-vector semantics can help corpus linguists annotate large datasets for semantic classes is assessed.

Learning Word Representations with Hierarchical Sparse Coding

An efficient learning algorithm based on stochastic proximal methods that is significantly faster than previous approaches is shown, making it possible to perform hierarchical sparse coding on a corpus of billions of word tokens.

Improving Semantic Similarity of Words by Retrofitting Word Vectors in Sense Level

This paper uses semantic relations as positive and negative examples to re-train the results of a pre-trained model instead of integrating them into the objective functions used during training to improve semantic distinction of words.

Learning word representations for Turkish

  • M. U. SenHakan Erdogan
  • Computer Science
    2014 22nd Signal Processing and Communications Applications Conference (SIU)
  • 2014
The recently introduced skip-gram model improved performance on unsupervised learning of word embeddings that contains rich syntactic and semantic word relations both in terms of accuracy and speed.

Improved Word Embeddings with Implicit Structure Information

This work introduces an extension to the continuous bag-of-words model for learning word representations efficiently by using implicit structure information, and compute weights representing probabilities of syntactic relations based on the Huffman softmax tree in an efficient heuristic.



Linguistic Regularities in Continuous Space Word Representations

The vector-space word representations that are implicitly learned by the input-layer weights are found to be surprisingly good at capturing syntactic and semantic regularities in language, and that each relationship is characterized by a relation-specific vector offset.

Strategies for training large scale neural network language models

This work describes how to effectively train neural network based language models on large data sets and introduces hash-based implementation of a maximum entropy model, that can be trained as a part of the neural network model.

Word Representations: A Simple and General Method for Semi-Supervised Learning

This work evaluates Brown clusters, Collobert and Weston (2008) embeddings, and HLBL (Mnih & Hinton, 2009) embeds of words on both NER and chunking, and finds that each of the three word representations improves the accuracy of these baselines.

A Neural Probabilistic Language Model

This work proposes to fight the curse of dimensionality by learning a distributed representation for words which allows each training sentence to inform the model about an exponential number of semantically neighboring sentences.

A fast and simple algorithm for training neural probabilistic language models

This work proposes a fast and simple algorithm for training NPLMs based on noise-contrastive estimation, a newly introduced procedure for estimating unnormalized continuous distributions and demonstrates the scalability of the proposed approach by training several neural language models on a 47M-word corpus with a 80K-word vocabulary.

Natural Language Processing (Almost) from Scratch

We propose a unified neural network architecture and learning algorithm that can be applied to various natural language processing tasks including part-of-speech tagging, chunking, named entity

Hierarchical Probabilistic Neural Network Language Model

A hierarchical decomposition of the conditional probabilities that yields a speed-up of about 200 both during training and recognition, constrained by the prior knowledge extracted from the WordNet semantic hierarchy is introduced.

Neural network based language models for highly inflective languages

Improvements obtained in recognition of spoken Czech lectures using language models based on neural networks using modified Kneser-Ney smoothing are described.

Continuous space language models

The Microsoft Research Sentence Completion Challenge

This work presents the MSR Sentence Completion Challenge Data, which consists of 1,040 sentences, each of which has four impostor sentences, in which a single (fixed) word in the original sentence has been replaced by an imposter word with similar occurrence statistics.