• Corpus ID: 7478738

Linguistic Regularities in Continuous Space Word Representations

  title={Linguistic Regularities in Continuous Space Word Representations},
  author={Tomas Mikolov and Wen-tau Yih and Geoffrey Zweig},
  booktitle={North American Chapter of the Association for Computational Linguistics},
Continuous space language models have recently demonstrated outstanding results across a variety of tasks. [] Key Result Remarkably, this method outperforms the best previous systems.

Figures and Tables from this paper

Morphological Smoothing and Extrapolation of Word Embeddings

A latentvariable Gaussian graphical model is presented that allows us to extrapolate continuous representations for words not observed in the training corpus, as well as smoothing the representations provided for the observed words.

Urdu Word Embeddings

The skip-gram model is trained on more than 140 million Urdu words to create the first large-scale word embeddings for the Urdu language, which capture a high degree of syntactic and semantic similarity between words.

Deriving Adjectival Scales from Continuous Space Word Representations

This work pushes the interpretation of continuous space word representations further by demonstrating that vector offsets can be used to derive adjectival scales and evaluating the scales on the indirect answers to yes/no questions corpus.

Better Word Representations with Recursive Neural Networks for Morphology

This paper combines recursive neural networks, where each morpheme is a basic unit, with neural language models to consider contextual information in learning morphologicallyaware word representations and proposes a novel model capable of building representations for morphologically complex words from their morphemes.

Semantic Regularities in Document Representations

A new document analogy task is designed for testing the semantic regularities in document representations, and empirical evaluations over several state-of-the-art document representation models reveal that neural embedding based document representations work better on this analogy task than conventional methods.

Pattern-based methods for Improved Lexical Semantics and Word Embeddings

This dissertation will show that despite their tremendous success, word embeddings suffer from several limitations and present a set of pattern-based solutions to address these problems, and demonstrate that pattern based methods can be superior.

Joint Word Representation Learning Using a Corpus and a Semantic Lexicon

A joint word representation learning method that simultaneously predict the co-occurrences of two words in a sentence subject to the relational constrains given by the semantic lexicon and statistically significantly outperforms previously proposed methods.

Discovering linguistic (ir)regularities in word embeddings through max-margin separating hyperplanes

The model, SVMCos, is robust to a range of experimental choices when training word embeddings and finds that this representation of the relationship obtains the best results in dis-covering linguistic regularities.

Sentence Analogies: Linguistic Regularities in Sentence Embeddings

This paper investigates to what extent commonly used sentence vector representation spaces as well reflect certain kinds of regularities, and proposes a number of schemes to induce evaluation data, based on lexical analogy data as well as semantic relationships between sentences.

Word Embedding Evaluation in Downstream Tasks and Semantic Analogies

The results show that a diverse and comprehensive corpus can often outperform a larger, less textually diverse corpus, and that batch training may cause quality loss in THE AUTHORS models.



A Neural Probabilistic Language Model

This work proposes to fight the curse of dimensionality by learning a distributed representation for words which allows each training sentence to inform the model about an exponential number of semantically neighboring sentences.

Neural Probabilistic Language Models

This work proposes to fight the curse of dimensionality by learning a distributed representation for words which allows each training sentence to inform the model about an exponential number of semantically neighboring sentences, and incorporates this new language model into a state-of-the-art speech recognizer of conversational speech.

Efficient Estimation of Word Representations in Vector Space

Two novel model architectures for computing continuous vector representations of words from very large data sets are proposed and it is shown that these vectors provide state-of-the-art performance on the authors' test set for measuring syntactic and semantic word similarities.

Joint Learning of Words and Meaning Representations for Open-Text Semantic Parsing

This work proposes a method that learns to assign MRs to a wide range of text thanks to a training scheme that combines learning from knowledge bases with learning from raw text.

Continuous space language models

Word Representations: A Simple and General Method for Semi-Supervised Learning

This work evaluates Brown clusters, Collobert and Weston (2008) embeddings, and HLBL (Mnih & Hinton, 2009) embeds of words on both NER and chunking, and finds that each of the three word representations improves the accuracy of these baselines.

Distributed representations, simple recurrent networks, and grammatical structure

AbstractIn this paper three problems for a connectionist account of language are considered1.What is the nature of linguistic representations?2.How can complex structural relationships such as

Discovering Binary Codes for Documents by Learning Deep Generative Models

A deep generative model in which the lowest layer represents the word-count vector of a document and the top layer represents a learned binary code for that document is described, which allows more accurate and much faster retrieval than latent semantic analysis.

Structured Output Layer neural network language model

A new neural network language model (NNLM) based on word clustering to structure the output vocabulary: Structured Output Layer NNLM, able to handle vocabularies of arbitrary size, hence dispensing with the design of short-lists that are commonly used in NNLMs.

Hierarchical Probabilistic Neural Network Language Model

A hierarchical decomposition of the conditional probabilities that yields a speed-up of about 200 both during training and recognition, constrained by the prior knowledge extracted from the WordNet semantic hierarchy is introduced.