Efficient Non-parametric Estimation of Multiple Embeddings per Word in Vector Space

  title={Efficient Non-parametric Estimation of Multiple Embeddings per Word in Vector Space},
  author={Arvind Neelakantan and Jeevan Shankar and Alexandre Passos and Andrew McCallum},
There is rising interest in vector-space word embeddings and their use in NLP, especially given recent methods for their fast estimation at very large scale. [] Key Method It differs from recent related work by jointly performing word sense discrimination and embedding learning, by non-parametrically estimating the number of senses per word type, and by its efficiency and scalability. We present new state-of-the-art results in the word similarity in context task and demonstrate its scalability by training…

Figures and Tables from this paper

Multi Sense Embeddings from Topic Models

This work proposes a topic modeling based skip-gram approach for learning multi-prototype word embeddings and introduces a method to prune the embedDings determined by the probabilistic representation of the word in each topic.

Learning Context-Sensitive Word Embeddings with Neural Tensor Skip-Gram Model

A general architecture to learn the word and topic embeddings efficiently is presented, which is an extension to the Skip-Gram model and can model the interaction between words and topics simultaneously.

Different Contexts Lead to Different Word Embeddings

Experimental results show that the word representations learned by the proposed CBOW model outperform the competitive baselines and improve the quality of embeddings but also makes embeds suitable for polysemy.

PAWE: Polysemy Aware Word Embeddings

This work develops a new word embedding model that can accurately represent such words by automatically learning multiple representations for each word, whilst remaining computationally efficient.

Improving Distributed Representation of Word Sense via WordNet Gloss Composition and Context Clustering

The learned represen-tations outperform the publicly available embeddings on 2 out of 4 metrics in the word similarity task, and 6 out of 13 sub tasks in the analogical reasoning task.

Embedding Words and Senses Together via Joint Knowledge-Enhanced Training

This work proposes a new model which learns word and sense embeddings jointly and exploits large corpora and knowledge from semantic networks in order to produce a unified vector space of word and senses.

Embeddings in Natural Language Processing

This tutorial will provide a high-level synthesis of the main embedding techniques in NLP, in the broad sense, and start by conventional word embeddings and then move to other types of embedDings, such as sense-specific and graph alternatives.

Learning class-specific word embeddings

This work proposes to use class information to enhance the discriminativeness of words and presents a general framework consisting of a pair of convolutional neural networks to utilize the learned class-specific word embeddings as input for text classification tasks.

Contextualized Word Representations for Multi-Sense Embedding

Methods to generate multiple word representations for each word based on dependency structure relations are proposed that significantly outperformed state-of-the-art methods for multi-sense embeddings and show that the data sparseness problem is resolved due to the pre-training.

Gaussian Mixture Embeddings for Multiple Word Prototypes

This paper proposes the Gaussian mixture skip-gram model, and proposes the Dynamic GMSG (D-GMSG) model, a model that can be regarded as a gaussian mixture distribution in the embedded space, and each gaussian component represents a word sense.



Multi-View Learning of Word Embeddings via CCA

Low Rank Multi-View Learning (LR-MVL) is extremely fast, gives guaranteed convergence to a global optimum, is theoretically elegant, and achieves state-of-the-art performance on named entity recognition (NER) and chunking problems.

Lexicon Infused Phrase Embeddings for Named Entity Resolution

A new form of learning word embeddings that can leverage information from relevant lexicons to improve the representations, and the first system to use neural word embedDings to achieve state-of-the-art results on named-entity recognition in both CoNLL and Ontonotes NER are presented.

Word Representations: A Simple and General Method for Semi-Supervised Learning

This work evaluates Brown clusters, Collobert and Weston (2008) embeddings, and HLBL (Mnih & Hinton, 2009) embeds of words on both NER and chunking, and finds that each of the three word representations improves the accuracy of these baselines.

Tailoring Continuous Word Representations for Dependency Parsing

It is found that all embeddings yield significant parsing gains, including some recent ones that can be trained in a fraction of the time of others, suggesting their complementarity.

Improving Word Representations via Global Context and Multiple Word Prototypes

A new neural network architecture is presented which learns word embeddings that better capture the semantics of words by incorporating both local and global document context, and accounts for homonymy and polysemy by learning multiple embedDings per word.

Efficient Estimation of Word Representations in Vector Space

Two novel model architectures for computing continuous vector representations of words from very large data sets are proposed and it is shown that these vectors provide state-of-the-art performance on the authors' test set for measuring syntactic and semantic word similarities.

Distributed Representations of Words and Phrases and their Compositionality

This paper presents a simple method for finding phrases in text, and shows that learning good vector representations for millions of phrases is possible and describes a simple alternative to the hierarchical softmax called negative sampling.

Distributed Representations of Sentences and Documents

Paragraph Vector is an unsupervised algorithm that learns fixed-length feature representations from variable-length pieces of texts, such as sentences, paragraphs, and documents, and its construction gives the algorithm the potential to overcome the weaknesses of bag-of-words models.

Learning Word Vectors for Sentiment Analysis

This work presents a model that uses a mix of unsupervised and supervised techniques to learn word vectors capturing semantic term--document information as well as rich sentiment content, and finds it out-performs several previously introduced methods for sentiment classification.

A Neural Probabilistic Language Model

This work proposes to fight the curse of dimensionality by learning a distributed representation for words which allows each training sentence to inform the model about an exponential number of semantically neighboring sentences.