Corpus ID: 5959482

Efficient Estimation of Word Representations in Vector Space

@inproceedings{Mikolov2013EfficientEO,
  title={Efficient Estimation of Word Representations in Vector Space},
  author={Tomas Mikolov and Kai Chen and G. Corrado and J. Dean},
  booktitle={ICLR},
  year={2013}
}
We propose two novel model architectures for computing continuous vector representations of words from very large data sets. [...] Key Result Furthermore, we show that these vectors provide state-of-the-art performance on our test set for measuring syntactic and semantic word similarities.Expand
Rehabilitation of Count-Based Models for Word Vector Representations
TLDR
A systematic study of the use of the Hellinger distance to extract semantic representations from the word co-occurrence statistics of large text corpora shows that this distance gives good performance on word similarity and analogy tasks, with a proper type and size of context, and a dimensionality reduction based on a stochastic low-rank approximation. Expand
Learning Word Vectors for 157 Languages
TLDR
This paper describes how high quality word representations for 157 languages were trained on the free online encyclopedia Wikipedia and data from the common crawl project, and introduces three new word analogy datasets to evaluate these word vectors. Expand
Enriching Word Vectors with Subword Information
TLDR
A new approach based on the skipgram model, where each word is represented as a bag of character n-grams, with words being represented as the sum of these representations, which achieves state-of-the-art performance on word similarity and analogy tasks. Expand
Improving Word Representations with Document Labels
TLDR
This paper proposes to incorporate document labels into the learning process of word representations in two frameworks: neural network and matrix factorization and shows that these models can better capture the semantic and syntactic information than the original models. Expand
Neural Vector Conceptualization for Word Vector Space Interpretation
TLDR
This work introduces a new method to interpret arbitrary samples from a word vector space using a neural model to conceptualize word vectors, which means that it activates higher order concepts it recognizes in a given vector. Expand
GloVe: Global Vectors for Word Representation
TLDR
A new global logbilinear regression model that combines the advantages of the two major model families in the literature: global matrix factorization and local context window methods and produces a vector space with meaningful substructure. Expand
Can word vectors help corpus linguists?
TLDR
To what extent state-of-the-art word-vector semantics can help corpus linguists annotate large datasets for semantic classes is assessed. Expand
Measuring Word Significance using Distributed Representations of Words
TLDR
It is proposed to use the length of the vectors, together with the term frequency, as measure of word significance in a corpus to help extract syntactic and semantic features from large text corpora. Expand
Learning Word Representations with Hierarchical Sparse Coding
TLDR
An efficient learning algorithm based on stochastic proximal methods that is significantly faster than previous approaches is shown, making it possible to perform hierarchical sparse coding on a corpus of billions of word tokens. Expand
Learning word representations for Turkish
  • M. U. Sen, Hakan Erdogan
  • Computer Science
  • 2014 22nd Signal Processing and Communications Applications Conference (SIU)
  • 2014
TLDR
The recently introduced skip-gram model improved performance on unsupervised learning of word embeddings that contains rich syntactic and semantic word relations both in terms of accuracy and speed. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 43 REFERENCES
Improving Word Representations via Global Context and Multiple Word Prototypes
TLDR
A new neural network architecture is presented which learns word embeddings that better capture the semantics of words by incorporating both local and global document context, and accounts for homonymy and polysemy by learning multiple embedDings per word. Expand
Distributed Representations of Words and Phrases and their Compositionality
TLDR
This paper presents a simple method for finding phrases in text, and shows that learning good vector representations for millions of phrases is possible and describes a simple alternative to the hierarchical softmax called negative sampling. Expand
Linguistic Regularities in Continuous Space Word Representations
TLDR
The vector-space word representations that are implicitly learned by the input-layer weights are found to be surprisingly good at capturing syntactic and semantic regularities in language, and that each relationship is characterized by a relation-specific vector offset. Expand
Strategies for training large scale neural network language models
TLDR
This work describes how to effectively train neural network based language models on large data sets and introduces hash-based implementation of a maximum entropy model, that can be trained as a part of the neural network model. Expand
Word Representations: A Simple and General Method for Semi-Supervised Learning
TLDR
This work evaluates Brown clusters, Collobert and Weston (2008) embeddings, and HLBL (Mnih & Hinton, 2009) embeds of words on both NER and chunking, and finds that each of the three word representations improves the accuracy of these baselines. Expand
A Neural Probabilistic Language Model
TLDR
This work proposes to fight the curse of dimensionality by learning a distributed representation for words which allows each training sentence to inform the model about an exponential number of semantically neighboring sentences. Expand
A fast and simple algorithm for training neural probabilistic language models
TLDR
This work proposes a fast and simple algorithm for training NPLMs based on noise-contrastive estimation, a newly introduced procedure for estimating unnormalized continuous distributions and demonstrates the scalability of the proposed approach by training several neural language models on a 47M-word corpus with a 80K-word vocabulary. Expand
Natural Language Processing (Almost) from Scratch
We propose a unified neural network architecture and learning algorithm that can be applied to various natural language processing tasks including part-of-speech tagging, chunking, named entityExpand
Hierarchical Probabilistic Neural Network Language Model
TLDR
A hierarchical decomposition of the conditional probabilities that yields a speed-up of about 200 both during training and recognition, constrained by the prior knowledge extracted from the WordNet semantic hierarchy is introduced. Expand
Neural network based language models for highly inflective languages
TLDR
Improvements obtained in recognition of spoken Czech lectures using language models based on neural networks using modified Kneser-Ney smoothing are described. Expand
...
1
2
3
4
5
...