Do Multi-Sense Embeddings Improve Natural Language Understanding?

  title={Do Multi-Sense Embeddings Improve Natural Language Understanding?},
  author={Jiwei Li and Dan Jurafsky},
Learning a distinct representation for each sense of an ambiguous word could lead to more powerful and fine-grained models of vector-space representations. Yet while ‘multi-sense’ methods have been proposed and tested on artificial wordsimilarity tasks, we don’t know if they improve real natural language understanding tasks. In this paper we introduce a multisense embedding model based on Chinese Restaurant Processes that achieves state of the art performance on matching human word similarity… 

Figures and Tables from this paper

Multi-sense embeddings through a word sense disambiguation process

Context-Dependent Sense Embedding

A novel probabilistic model for sense embedding is proposed that is not based on problematic word embedding of polysemous words and takes into account the dependency between sense choices and outperforms the state-of-the-art model on a word sense induction task.

xSense: Learning Sense-Separated Sparse Representations and Textual Definitions for Explainable Word Sense Networks

A large and high-quality context-definition dataset that consists of sense definitions together with multiple example sentences per polysemous word, which is a valuable resource for definition modeling and word sense disambiguation is introduced.

Contextualized Word Representations for Multi-Sense Embedding

Methods to generate multiple word representations for each word based on dependency structure relations are proposed that significantly outperformed state-of-the-art methods for multi-sense embeddings and show that the data sparseness problem is resolved due to the pre-training.

Constructing High Quality Sense-specific Corpus and Word Embedding via Unsupervised Elimination of Pseudo Multi-sense

A novel framework for unsupervised corpus sense tagging is presented, and on the tasks of word similarity, word analogy as well as sentence understanding, the embeddings trained on sense-specific corpus obtain better results than the basic strategy which is applied in step (a).

On Modeling Sense Relatedness in Multi-prototype Word Embedding

A novel approach to capture word sense relatedness in multi-prototype word embedding model by introducing a random process to integrate these two types of senses and design two non-parametric methods for word sense induction.

Supervised word sense disambiguation using new features based on word embeddings

Four improvements to existing state-of-the-art WSD methods are proposed, including a new model for assigning vector coefficients for a more precise context representation and a PCA dimensionality reduction process to find a better transformation of feature matrices and train a more informative model.

Probing for Semantic Classes: Diagnosing the Meaning Content of Word Embeddings

A large dataset based on manual Wikipedia annotations and word senses, where word senses from different words are related by semantic classes is presented, and a classifier can accurately predict whether a word is single-sense or multi-sense, based only on its embedding.

Multi Sense Embeddings from Topic Models

This work proposes a topic modeling based skip-gram approach for learning multi-prototype word embeddings and introduces a method to prune the embedDings determined by the probabilistic representation of the word in each topic.



Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank

A Sentiment Treebank that includes fine grained sentiment labels for 215,154 phrases in the parse trees of 11,855 sentences and presents new challenges for sentiment compositionality, and introduces the Recursive Neural Tensor Network.

Semantic Compositionality through Recursive Matrix-Vector Spaces

A recursive neural network model that learns compositional vector representations for phrases and sentences of arbitrary syntactic type and length and can learn the meaning of operators in propositional logic and natural language is introduced.

Learning Word Representation Considering Proximity and Ambiguity

Proximity-Ambiguity Sensitive (PAS) models are proposed to produce high quality distributed representations of words considering both word proximity and ambiguity, and the strength of pooling-structured neural networks in word representation learning is revealed.

Efficient Non-parametric Estimation of Multiple Embeddings per Word in Vector Space

An extension to the Skip-gram model that efficiently learns multiple embeddings per word type is presented, and its scalability is demonstrated by training with one machine on a corpus of nearly 1 billion tokens in less than 6 hours.

A Unified Model for Word Sense Representation and Disambiguation

A unified model for joint word sense representation and disambiguation, which will assign distinct representations for each word sense and improves the performance of contextual word similarity compared to existing WSR methods, outperforms state-of-the-art supervised methods on domainspecific WSD, and achieves competitive performance on coarse-grained all-words WSD.

Improving Word Representations via Global Context and Multiple Word Prototypes

A new neural network architecture is presented which learns word embeddings that better capture the semantics of words by incorporating both local and global document context, and accounts for homonymy and polysemy by learning multiple embedDings per word.

GloVe: Global Vectors for Word Representation

A new global logbilinear regression model that combines the advantages of the two major model families in the literature: global matrix factorization and local context window methods and produces a vector space with meaningful substructure.

Sense-Aaware Semantic Analysis: A Multi-Prototype Word Representation Model Using Wikipedia

SaSA is presented, a multi-prototype VSM for word representation based on Wikipedia, which could account for homonymy and polysemy and demonstrate its effectiveness on semantic relatedness for both isolated words and words in sentential contexts.

Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks

The Tree-LSTM is introduced, a generalization of LSTMs to tree-structured network topologies that outperform all existing systems and strong LSTM baselines on two tasks: predicting the semantic relatedness of two sentences and sentiment classification.

A unified architecture for natural language processing: deep neural networks with multitask learning

We describe a single convolutional neural network architecture that, given a sentence, outputs a host of language processing predictions: part-of-speech tags, chunks, named entity tags, semantic