Source-LDA: Enhancing Probabilistic Topic Models Using Prior Knowledge Sources

  title={Source-LDA: Enhancing Probabilistic Topic Models Using Prior Knowledge Sources},
  author={Justin Wood and Patrick Tan and Wei Wang and Corey W. Arnold},
  journal={2017 IEEE 33rd International Conference on Data Engineering (ICDE)},
Topic modeling has increasingly attracted interests from researchers. Common methods of topic modeling usually produce a collection of unlabeled topics where each topic is depicted by a distribution of words. Associating semantic meaning with these word distributions is not always straightforward. Traditionally, this task is left to human interpretation. Manually labeling the topics is unfortunately not always easy, as topics generated by unsupervised learning methods do not necessarily align… 

Figures and Tables from this paper

An Enhanced Method for Topic Modeling using Concept-Latent

  • Christy AMeera Gandhi G
  • Computer Science
    2019 International Conference on Machine Learning, Big Data, Cloud and Parallel Computing (COMITCon)
  • 2019
An algorithm called Concept-Latent Dirichlet Allocation (Concept-LDA) is proposed which incorporates reinforcement learning for topic modeling which inturn improves the accuracy of resulting topics and of the topic labeling.

Knowledge Source Rankings for Semi-Supervised Topic Modeling

By applying the ranking technique, this approach is able to eliminate low scoring article-topics before inference, speeding up the overall process and can also improve perplexity and interpretability.

A Semi-supervised Hidden Markov Topic Model Based on Prior Knowledge

A topic model which employs topically-related knowledge from prior topics and words’ co-occurrence/relations in the collection is presented, which demonstrates improvements of both the PMI score measure and the topic coherence.

Knowledge Base Enhanced Topic Modeling

This paper takes knowledge bases as good presentations of human knowledge, with huge collections of entities and their relations, and proposes a knowledge base enhanced topic model that boosts the LDA model on the document classification while no supervision information is needed.

A Bayesian Topic Model for Human-Evaluated Interpretability

This paper aims to improve interpretability in topic modeling by providing a novel, outperforming interpretable topic model that combines nonparametric and weakly-supervised topic models.

Transfer Topic Labeling with Domain-Specific Knowledge Base: An Analysis of UK House of Commons Speeches 1935-2014

This work presents a semi-automatic transfer topic labeling method, using the coding instructions of the Comparative Agendas Project to label topics, and shows that it works well for a majority of the topics it estimates, but finds that institution-specific topics require manual input.

Creating Prior-Knowledge of Source-LDA for Topic Discovery in Citation Network

An approach to automatically construct knowledge source for Source-LDA from unlabeled data with an assumption that a specific paper will often cite papers which contain related topics and resolve the issue of short text by using information from citation network.

Transfer learning for topic labeling: Analysis of the UK House of Commons speeches 1935–2014

The proposed transfer topic labeling method was simple to implement, compared favorably to expert judgments, and outperformed the neural networks model for a majority of the topics the authors estimated.

CitationLDA++: an Extension of LDA for Discovering Topics in Document Network

This paper proposes CitationLDA++ model that can improve the performance of the LDA algorithm in inferring topics of the papers basing on the title or/and abstract and citation information, and divides the dataset into two sets used in the inference process with Gibbs sampling.

Modeling Queries with Contextual Snippets for Information Retrieval

This work proposes an approach that adapts the PRF-based contextual snippets into a context-aware topic model to enhance query representations and establishes a bridge between the snippets and the corresponding PRF documents, which can be used for modeling the topics more precisely and efficiently.



Automatic labeling of multinomial topic models

Probabilistic approaches to automatically labeling multinomial topic models in an objective way are proposed and can be applied to labeling topics learned through all kinds of topic models such as PLSA, LDA, and their variations.

Automatic labeling hierarchical topics

This paper proposes two effective algorithms that automatically assign concise labels to each topic in a hierarchy by exploiting sibling and parent-child relations among topics and shows that the inter-topic relation is effective in boosting topic labeling accuracy.

Probabilistic Explicit Topic Modeling Using Wikipedia

Two methods for probabilistic explicit topic modeling that overcome the nonidentifiability, isolation, and unintepretability of LDA output are introduced and are assessed by means of crowd-sourced user studies on two tasks: topic label generation and document label generation.

Incorporating Lexical Priors into Topic Models

This work proposes a simple and effective way to guide topic models to learn topics of specific interest to a user by providing sets of seed words that a user believes are representative of the underlying topics in a corpus.

Combining Background Knowledge and Learned Topics

A new probabilistic framework for combining a hierarchy of human-defined semantic concepts with a statistical topic model is reviewed to seek the best of both worlds and results indicate that this combination leads to systematic improvements in generalization performance as well as enabling new techniques for inferring and visualizing the content of a document.

Text Modeling using Unsupervised Topic Models and Concept Hierarchies

This paper proposes a probabilistic framework to combine a hierarchy of human-defined semantic concepts with statistical topic models to seek the best of both worlds and results indicate that this combination leads to systematic improvements in the quality of the associated language models.

Reading Tea Leaves: How Humans Interpret Topic Models

New quantitative methods for measuring semantic meaning in inferred topics are presented, showing that they capture aspects of the model that are undetected by previous measures of model quality based on held-out likelihood.

Labeled LDA: A supervised topic model for credit attribution in multi-labeled corpora

Labeled LDA is introduced, a topic model that constrains Latent Dirichlet Allocation by defining a one-to-one correspondence between LDA's latent topics and user tags that allows Labeled LDA to directly learn word-tag correspondences.

Finding scientific topics

  • T. GriffithsM. Steyvers
  • Computer Science
    Proceedings of the National Academy of Sciences of the United States of America
  • 2004
A generative model for documents is described, introduced by Blei, Ng, and Jordan, and a Markov chain Monte Carlo algorithm is presented for inference in this model, which is used to analyze abstracts from PNAS by using Bayesian model selection to establish the number of topics.

Transfer Topic Modeling with Ease and Scalability

This paper develops Transfer Hierarchical LDA (thLDA) model, which incorporates the label information from other domains via informative priors and demonstrates the effectiveness of the model on both a microblogging dataset and standard text collections including AP and RCV1 datasets.