Distributed Representations of Words and Phrases and their Compositionality
This paper presents a simple method for finding phrases in text, and shows that learning good vector representations for millions of phrases is possible and describes a simple alternative to the hierarchical softmax called negative sampling.
Efficient Estimation of Word Representations in Vector Space
- Tomas Mikolov, Kai Chen, G. Corrado, J. Dean
- Computer ScienceInternational Conference on Learning…
- 16 January 2013
Two novel model architectures for computing continuous vector representations of words from very large data sets are proposed and it is shown that these vectors provide state-of-the-art performance on the authors' test set for measuring syntactic and semantic word similarities.
Distributed Representations of Sentences and Documents
Paragraph Vector is an unsupervised algorithm that learns fixed-length feature representations from variable-length pieces of texts, such as sentences, paragraphs, and documents, and its construction gives the algorithm the potential to overcome the weaknesses of bag-of-words models.
Enriching Word Vectors with Subword Information
- Piotr Bojanowski, Edouard Grave, Armand Joulin, Tomas Mikolov
- Computer ScienceInternational Conference on Topology, Algebra and…
- 15 July 2016
A new approach based on the skipgram model, where each word is represented as a bag of character n-grams, with words being represented as the sum of these representations, which achieves state-of-the-art performance on word similarity and analogy tasks.
Linguistic Regularities in Continuous Space Word Representations
- Tomas Mikolov, Wen-tau Yih, G. Zweig
- Computer ScienceNorth American Chapter of the Association for…
- 27 May 2013
The vector-space word representations that are implicitly learned by the input-layer weights are found to be surprisingly good at capturing syntactic and semantic regularities in language, and that each relationship is characterized by a relation-specific vector offset.
Recurrent neural network based language model
Results indicate that it is possible to obtain around 50% reduction of perplexity by using mixture of several RNN LMs, compared to a state of the art backoff language model.
Bag of Tricks for Efficient Text Classification
- Armand Joulin, Edouard Grave, Piotr Bojanowski, Tomas Mikolov
- Computer ScienceConference of the European Chapter of the…
- 6 July 2016
A simple and efficient baseline for text classification is explored that shows that the fast text classifier fastText is often on par with deep learning classifiers in terms of accuracy, and many orders of magnitude faster for training and evaluation.
Exploiting Similarities among Languages for Machine Translation
This method can translate missing word and phrase entries by learning language structures based on large monolingual data and mapping between languages from small bilingual data and uses distributed representation of words and learns a linear mapping between vector spaces of languages.
On the difficulty of training recurrent neural networks
- Razvan Pascanu, Tomas Mikolov, Yoshua Bengio
- Computer ScienceInternational Conference on Machine Learning
- 21 November 2012
This paper proposes a gradient norm clipping strategy to deal with exploding gradients and a soft constraint for the vanishing gradients problem and validates empirically the hypothesis and proposed solutions.
DeViSE: A Deep Visual-Semantic Embedding Model
This paper presents a new deep visual-semantic embedding model trained to identify visual objects using both labeled image data as well as semantic information gleaned from unannotated text and shows that the semantic information can be exploited to make predictions about tens of thousands of image labels not observed during training.