Contextual Text Understanding in Distributional Semantic Space

@article{Cheng2015ContextualTU,
  title={Contextual Text Understanding in Distributional Semantic Space},
  author={Jianpeng Cheng and Zhongyuan Wang and Ji-Rong Wen and Jun Yan and Zheng Chen},
  journal={Proceedings of the 24th ACM International on Conference on Information and Knowledge Management},
  year={2015}
}
Representing discrete words in a continuous vector space turns out to be useful for natural language applications related to text understanding. Meanwhile, it poses extensive challenges, one of which is due to the polysemous nature of human language. A common solution (a.k.a word sense induction) is to separate each word into multiple senses and create a representation for each sense respectively. However, this approach is usually computationally expensive and prone to data sparsity, since each… Expand
Multi Sense Embeddings from Topic Models
TLDR
This work proposes a topic modeling based skip-gram approach for learning multi-prototype word embeddings and introduces a method to prune the embedDings determined by the probabilistic representation of the word in each topic. Expand
Leveraging Conceptualization for Short-Text Embedding
TLDR
A novel short-text conceptualization algorithm to assign the associated concepts for each short- text, and then introduces the conceptualization results into learning the conceptual short- Text embeddings, which are more expressive than some widely-used text representation models such as the latent topic model. Expand
Entity disambiguation with context awareness in user-generated short texts
TLDR
An entity disambiguation method which consists of measuring the correlation between terms and selecting informative terms, which considers various types of contextual terms, not just entities, thereby mitigating the effects of text sparsity. Expand
Concept Embedded Convolutional Semantic Model for Question Retrieval
TLDR
This paper proposes to use a high-level feature embedded convolutional semantic model to learn the question embedding by inputting the concept embedding and word embedding without manually labeling training data. Expand
SoulMate: Short-text author linking through Multi-aspect temporal-textual embedding
TLDR
A neural network-based temporal-textual framework that generates the tightly connected author subgraphs from microblog short-text contents and employs a stack-wise graph cutting algorithm to extract the communities of the related authors. Expand
Learning word embeddings via context grouping
TLDR
A grouping-based context predictive model is proposed by considering the interactions of context words, which generalizes the widely used CBOW model and Skip-Gram model and demonstrates better representations can be learned with suitable context groups. Expand
WELDA: Enhancing Topic Models by Incorporating Local Word Context
TLDR
WELDA is proposed, a new type of topic model, which combines word embeddings (THE AUTHORS) with latent Dirichlet allocation (LDA) to improve topic quality by estimating topic distributions in the word embedding space and exchanging selected topic words via Gibbs sampling from this space. Expand
A Mixture Model for Learning Multi-Sense Word Embeddings
TLDR
This paper proposes a mixture model for learning multi-sense word embeddings that generalizes the previous works in that it allows to induce different weights of different senses of a word. Expand
Learning Unsupervised Knowledge-Enhanced Representations to Reduce the Semantic Gap in Information Retrieval
TLDR
The Semantic-Aware Neural Framework for IR (SAFIR), an unsupervised knowledge-enhanced neural framework explicitly tailored for IR, is proposed and its application in the medical domain where the semantic gap is prominent and there are many specialized and manually curated knowledge resources is investigated. Expand
Emphasizing Essential Words for Sentiment Classification Based on Recurrent Neural Networks
TLDR
A keyword vocabulary is established and an LSTM-based model is proposed that is sensitive to the words in the vocabulary; hence, the keywords leverage the semantics of the full document. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 48 REFERENCES
Do Multi-Sense Embeddings Improve Natural Language Understanding?
TLDR
A multisense embedding model based on Chinese Restaurant Processes is introduced that achieves state of the art performance on matching human word similarity judgments, and a pipelined architecture for incorporating multi-sense embeddings into language understanding is proposed. Expand
Measuring semantic relatedness using salient encyclopedic concepts
TLDR
This work represents a novel approach to integrate world-knowledge into current semantic models and a means to cross the language boundary for a better and more robust semantic relatedness representation, thus opening the door for an improved abstraction of meaning that carries the potential of ultimately imparting understanding of natural language to machines. Expand
Linguistic Regularities in Continuous Space Word Representations
TLDR
The vector-space word representations that are implicitly learned by the input-layer weights are found to be surprisingly good at capturing syntactic and semantic regularities in language, and that each relationship is characterized by a relation-specific vector offset. Expand
Improving Word Representations via Global Context and Multiple Word Prototypes
TLDR
A new neural network architecture is presented which learns word embeddings that better capture the semantics of words by incorporating both local and global document context, and accounts for homonymy and polysemy by learning multiple embedDings per word. Expand
Short text understanding through lexical-semantic analysis
TLDR
This work uses lexical-semantic knowledge provided by a well-known semantic network for short text understanding and uses knowledge-intensive approaches that focus on semantics in all these tasks. Expand
Efficient Non-parametric Estimation of Multiple Embeddings per Word in Vector Space
TLDR
An extension to the Skip-gram model that efficiently learns multiple embeddings per word type is presented, and its scalability is demonstrated by training with one machine on a corpus of nearly 1 billion tokens in less than 6 hours. Expand
A Simple and Efficient Method to Generate Word Sense Representations
TLDR
A simple model is presented that enables recent techniques for building word vectors to represent distinct senses of polysemic words and is able to effectively discriminate between words' senses and to do so in a computationally efficient manner. Expand
Learning Structured Embeddings of Knowledge Bases
TLDR
A learning process based on an innovative neural network architecture designed to embed any of these symbolic representations into a more flexible continuous vector space in which the original knowledge is kept and enhanced would allow data from any KB to be easily used in recent machine learning methods for prediction and information retrieval. Expand
Knowledge-Powered Deep Learning for Word Embedding
TLDR
This study explores the capacity of leveraging morphological, syntactic, and semantic knowledge to achieve high-quality word embeddings, and explores these types of knowledge to define new basis for word representation, provide additional input information, and serve as auxiliary supervision in deep learning. Expand
A Neural Probabilistic Language Model
TLDR
This work proposes to fight the curse of dimensionality by learning a distributed representation for words which allows each training sentence to inform the model about an exponential number of semantically neighboring sentences. Expand
...
1
2
3
4
5
...