Efficient Estimation of Word Representations in Vector Space
@inproceedings{Mikolov2013EfficientEO, title={Efficient Estimation of Word Representations in Vector Space}, author={Tomas Mikolov and Kai Chen and Gregory S. Corrado and Jeffrey Dean}, booktitle={International Conference on Learning Representations}, year={2013} }
We propose two novel model architectures for computing continuous vector representations of words from very large data sets. [] Key Result Furthermore, we show that these vectors provide state-of-the-art performance on our test set for measuring syntactic and semantic word similarities.
24,813 Citations
Rehabilitation of Count-Based Models for Word Vector Representations
- Computer ScienceCICLing
- 2015
A systematic study of the use of the Hellinger distance to extract semantic representations from the word co-occurrence statistics of large text corpora shows that this distance gives good performance on word similarity and analogy tasks, with a proper type and size of context, and a dimensionality reduction based on a stochastic low-rank approximation.
Learning Word Vectors for 157 Languages
- Computer ScienceLREC
- 2018
This paper describes how high quality word representations for 157 languages were trained on the free online encyclopedia Wikipedia and data from the common crawl project, and introduces three new word analogy datasets to evaluate these word vectors.
Enriching Word Vectors with Subword Information
- Computer ScienceTACL
- 2017
A new approach based on the skipgram model, where each word is represented as a bag of character n-grams, with words being represented as the sum of these representations, which achieves state-of-the-art performance on word similarity and analogy tasks.
Improving Word Representations with Document Labels
- Computer ScienceIEEE/ACM Transactions on Audio, Speech, and Language Processing
- 2017
This paper proposes to incorporate document labels into the learning process of word representations in two frameworks: neural network and matrix factorization and shows that these models can better capture the semantic and syntactic information than the original models.
Neural Vector Conceptualization for Word Vector Space Interpretation
- Computer ScienceProceedings of the 3rd Workshop on Evaluating Vector Space Representations for
- 2019
This work introduces a new method to interpret arbitrary samples from a word vector space using a neural model to conceptualize word vectors, which means that it activates higher order concepts it recognizes in a given vector.
Can word vectors help corpus linguists?
- Computer ScienceStudia Neophilologica
- 2019
To what extent state-of-the-art word-vector semantics can help corpus linguists annotate large datasets for semantic classes is assessed.
Learning Word Representations with Hierarchical Sparse Coding
- Computer ScienceICML
- 2015
An efficient learning algorithm based on stochastic proximal methods that is significantly faster than previous approaches is shown, making it possible to perform hierarchical sparse coding on a corpus of billions of word tokens.
Improving Semantic Similarity of Words by Retrofitting Word Vectors in Sense Level
- Computer ScienceICAART
- 2020
This paper uses semantic relations as positive and negative examples to re-train the results of a pre-trained model instead of integrating them into the objective functions used during training to improve semantic distinction of words.
Learning word representations for Turkish
- Computer Science2014 22nd Signal Processing and Communications Applications Conference (SIU)
- 2014
The recently introduced skip-gram model improved performance on unsupervised learning of word embeddings that contains rich syntactic and semantic word relations both in terms of accuracy and speed.
Improved Word Embeddings with Implicit Structure Information
- Computer ScienceCOLING
- 2016
This work introduces an extension to the continuous bag-of-words model for learning word representations efficiently by using implicit structure information, and compute weights representing probabilities of syntactic relations based on the Huffman softmax tree in an efficient heuristic.
References
SHOWING 1-10 OF 36 REFERENCES
Linguistic Regularities in Continuous Space Word Representations
- Computer ScienceNAACL
- 2013
The vector-space word representations that are implicitly learned by the input-layer weights are found to be surprisingly good at capturing syntactic and semantic regularities in language, and that each relationship is characterized by a relation-specific vector offset.
Strategies for training large scale neural network language models
- Computer Science2011 IEEE Workshop on Automatic Speech Recognition & Understanding
- 2011
This work describes how to effectively train neural network based language models on large data sets and introduces hash-based implementation of a maximum entropy model, that can be trained as a part of the neural network model.
Word Representations: A Simple and General Method for Semi-Supervised Learning
- Computer ScienceACL
- 2010
This work evaluates Brown clusters, Collobert and Weston (2008) embeddings, and HLBL (Mnih & Hinton, 2009) embeds of words on both NER and chunking, and finds that each of the three word representations improves the accuracy of these baselines.
A Neural Probabilistic Language Model
- Computer ScienceJ. Mach. Learn. Res.
- 2000
This work proposes to fight the curse of dimensionality by learning a distributed representation for words which allows each training sentence to inform the model about an exponential number of semantically neighboring sentences.
A fast and simple algorithm for training neural probabilistic language models
- Computer ScienceICML
- 2012
This work proposes a fast and simple algorithm for training NPLMs based on noise-contrastive estimation, a newly introduced procedure for estimating unnormalized continuous distributions and demonstrates the scalability of the proposed approach by training several neural language models on a 47M-word corpus with a 80K-word vocabulary.
Natural Language Processing (Almost) from Scratch
- Computer ScienceJ. Mach. Learn. Res.
- 2011
We propose a unified neural network architecture and learning algorithm that can be applied to various natural language processing tasks including part-of-speech tagging, chunking, named entity…
Hierarchical Probabilistic Neural Network Language Model
- Computer ScienceAISTATS
- 2005
A hierarchical decomposition of the conditional probabilities that yields a speed-up of about 200 both during training and recognition, constrained by the prior knowledge extracted from the WordNet semantic hierarchy is introduced.
Neural network based language models for highly inflective languages
- Computer Science2009 IEEE International Conference on Acoustics, Speech and Signal Processing
- 2009
Improvements obtained in recognition of spoken Czech lectures using language models based on neural networks using modified Kneser-Ney smoothing are described.
The Microsoft Research Sentence Completion Challenge
- Computer Science
- 2011
This work presents the MSR Sentence Completion Challenge Data, which consists of 1,040 sentences, each of which has four impostor sentences, in which a single (fixed) word in the original sentence has been replaced by an imposter word with similar occurrence statistics.