Efficient Correlated Topic Modeling with Topic Embedding

@article{He2017EfficientCT,
  title={Efficient Correlated Topic Modeling with Topic Embedding},
  author={Junxian He and Zhiting Hu and Taylor Berg-Kirkpatrick and Ying Huang and Eric P. Xing},
  journal={Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining},
  year={2017}
}
  • Junxian He, Zhiting Hu, +2 authors E. Xing
  • Published 1 July 2017
  • Computer Science, Mathematics
  • Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
Correlated topic modeling has been limited to small model and problem sizes due to their high computational cost and poor scaling. In this paper, we propose a new model which learns compact topic embeddings and captures topic correlations through the closeness between the topic vectors. Our method enables efficient inference in the low-dimensional embedding space, reducing previous cubic or quadratic time complexity to linear w.r.t the topic size. We further speedup variational inference with a… Expand
Graph Attention Topic Modeling Network
TLDR
A new method to overcome the overfitting issue of pLSI is provided by using the amortized inference with word embedding as input, instead of the Dirichlet prior in LDA. Expand
A Novel Generative Topic Embedding Model by Introducing Network Communities
TLDR
A new generative topic embedding model is given which incorporates documents (with topics) and network (with communities) together, and uses probability transition to describe the relationship between topics and communities to make it robust when Topics and communities do not match. Expand
Combine Topic Modeling with Semantic Embedding: Embedding Enhanced Topic Model
TLDR
A novel integration framework is proposed to combine the two representation methods, where topic information can be transmitted into corresponding semantic embedding structure, and a Embedding Enhanced Topic Model is constructed, which can improve topic modeling and generate topic embeddings by leveraging the word embedding. Expand
Rank-Integrated Topic Modeling: A General Framework
TLDR
A new method to integrate topical ranking with topic modeling and a general framework for topic modeling of documents with link structures is introduced by interpreting the normalized topical ranking score vectors as topic distributions for documents. Expand
Recurrent Coupled Topic Modeling over Sequential Documents
TLDR
This work assumes that a current topic evolves from all prior topics with corresponding coupling weights, forming the multi-topic-thread evolution, and models the dependencies between evolving topics and thoroughly encodes their complex multi-couplings across time steps. Expand
ASTM: An Attentional Segmentation Based Topic Model for Short Texts
TLDR
This work proposes a novel model, Attentional Segmentation based Topic Model (ASTM), to integrate both word embeddings as supplementary information and an attention mechanism that segments short text documents into fragments of adjacent words receiving similar attention. Expand
An Embedding-Based Topic Model for Document Classification
TLDR
This article presents a two-stage algorithm for topic modelling that leverages word embeddings and word co-occurrence and demonstrates the remarkable comparative effectiveness of the proposed algorithm in a task of document classification. Expand
PhraseCTM: Correlated Topic Modeling on Phrases within Markov Random Fields
TLDR
A novel topic model PhraseCTM and a two-stage method to find out the correlated topics at phrase level and evaluates the method by a quantitative experiment and a human study, showing the correlated topic modeling on phrases is a good and practical way to interpret the underlying themes of a corpus. Expand
3-in-1 Correlated Embedding via Adaptive Exploration of the Structure and Semantic Subspaces
TLDR
A novel generative model is proposed to formulate the generation process of the network and content from the embeddings, with respect to the Bayesian framework, and automatically leans to the information which is more discriminative. Expand
Neural Models for Documents with Metadata
TLDR
A general neural framework is proposed, based on topic models, to enable flexible incorporation of metadata and allow for rapid exploration of alternative models, and achieves strong performance, with a manageable tradeoff between perplexity, coherence, and sparsity. Expand
...
1
2
3
4
...

References

SHOWING 1-10 OF 45 REFERENCES
Reducing the sampling complexity of topic models
TLDR
An algorithm which scales linearly with the number of actually instantiated topics kd in the document, for large document collections and in structured hierarchical models kd ll k, yields an order of magnitude speedup. Expand
Nonparametric Spherical Topic Modeling with Word Embeddings
TLDR
This paper uses a Hierarchical Dirichlet Process for the base topic model and proposes an efficient inference algorithm based on Stochastic Variational Inference that enables it to naturally exploit the semantic structures of word embeddings while flexibly discovering the number of topics. Expand
Independent factor topic models
TLDR
Independent Factor Topic Models (IFTM) are proposed which use linear latent variable models to uncover the hidden sources of correlation between topics to provide a fast Newton-Raphson based variational inference algorithm. Expand
Scalable Inference for Logistic-Normal Topic Models
TLDR
This paper presents a partially collapsed Gibbs sampling algorithm that approaches the provably correct distribution by exploring the ideas of data augmentation and presents a parallel implementation that can deal with large-scale applications and learn the correlation structures of thousands of topics from millions of documents. Expand
Latent Topic Embedding
TLDR
This work proposes a novel model named Latent Topic Embedding (LTE), which seamlessly integrates topic generation and embedding learning in one unified framework and proposes an efficient Monte Carlo EM algorithm to estimate the parameters of interest. Expand
Pachinko allocation: DAG-structured mixture models of topic correlations
TLDR
Improved performance of PAM is shown in document classification, likelihood of held-out data, the ability to support finer-grained topics, and topical keyword coherence. Expand
Manifold Learning for Jointly Modeling Topic and Visualization
TLDR
This work proposes an unsupervised probabilistic model, called SEMAFORE, which aims to preserve the manifold in the lowerdimensional spaces of the document manifold, and shows that it significantly outperforms the state-of-the-art baselines on objective evaluation metrics. Expand
Generative Topic Embedding: a Continuous Representation of Documents
TLDR
A generative topic embedding model is proposed that performs better than eight existing methods, with fewer features, and can generate coherent topics even based on only one document. Expand
Replicated Softmax: an Undirected Topic Model
We introduce a two-layer undirected graphical model, called a "Replicated Softmax", that can be used to model and automatically extract low-dimensional latent semantic representations from a largeExpand
Gaussian LDA for Topic Models with Word Embeddings
TLDR
Gaussian LDA is replaced with multivariate Gaussian distributions on the embedding space, which encourages the model to group words that are a priori known to be semantically related into topics into topics. Expand
...
1
2
3
4
5
...