• Corpus ID: 8443288

Large-Scale Hierarchical Topic Models

@inproceedings{Pujara2012LargeScaleHT,
  title={Large-Scale Hierarchical Topic Models},
  author={Jay Pujara},
  year={2012}
}
In the past decade, a number of advances in topic modeling have produced sophisticated models that are capable of generating hierarchies of topics. One challenge for these models is scalability: they are incapable of working at the massive scale of millions of documents and hundreds of thousands of terms. We address this challenge with a technique that learns a hierarchy of topics by iteratively applying topic models and processing subtrees of the hierarchy in parallel. This approach has a… 

Tables from this paper

Scalable Training of Hierarchical Topic Models
TLDR
This paper proposes an efficient partially collapsed Gibbs sampling algorithm for hLDA, as well as an initialization strategy to deal with local optima introduced by tree-structured models, and identifies new system challenges in building scalable systems for HTMs.
Scalable and Robust Construction of Topical Hierarchies
TLDR
A scalable and robust algorithm is proposed for constructing a hierarchy of topics from a text collection based on a tensor orthogonal decomposition technique, which reduces the time of construction by several orders of magnitude and renders it possible for users to interactively revise the hierarchy.
Construction and Quality Evaluation of Heterogeneous Hierarchical Topic Models
In our work, we propose to represent HTM as a set of flat models, or layers, and a set of topical hierarchies, or edges. We suggest several quality measures for edges of hierarchical models,
HTMOT : Hierarchical Topic Modelling Over Time
TLDR
This study proposes a novel method, HTMOT, to perform Hierarchical Topic Modelling Over Time, and shows that only applying time modelling to deep sub-topics provides a way to extract specific stories or events while high level topics extract larger themes in the corpus.
Towards Interactive Construction of Topical Hierarchy: A Recursive Tensor Decomposition Approach
TLDR
A novel method is proposed, called STROD, that allows efficient and consistent modification of topic hierarchies, based on a recursive generative model and a scalable tensor decomposition inference algorithm with theoretical performance guarantee.
Additive Regularization for Hierarchical Multimodal Topic Modeling
TLDR
The authors use non-Bayesian multicriteria approach called Additive Regularization of Topic Models (ARTM), which enables to combine any topic models formalized via log-likelihood maximization with additive regularization criteria, and proposed for topical hierarchies, which is well scalable on large text collections.
Scalable Inference for Nested Chinese Restaurant Process Topic Models
TLDR
A novel partially collapsed Gibbs sampling (PCGS) algorithm is proposed, which combines the advantages of collapsed and instantiated weight algorithms to achieve good scalability as well as high model quality and an initialization strategy is presented to further improve the model quality.
Probabilistic Model of Narratives Over Topical Trends in Social Media: A Discrete Time Model
TLDR
A novel event-based narrative summary extraction framework designed as a probabilistic topic model, with categorical time distribution, followed by extractive text summarization, which is effective in identifying topical trends, as well as extracting narrative summaries from text corpus with timestamped data.

References

SHOWING 1-9 OF 9 REFERENCES
Hierarchical Topic Models and the Nested Chinese Restaurant Process
TLDR
A Bayesian approach is taken to generate an appropriate prior via a distribution on partitions that allows arbitrarily large branching factors and readily accommodates growing data collections.
Mr. LDA: a flexible large scale topic modeling package using variational inference in MapReduce
TLDR
This paper introduces a novel and flexible large scale topic modeling package in MapReduce (Mr. LDA), which uses variational inference, which easily fits into a distributed environment and is easily extensible.
Pachinko Allocation: Scalable Mixture Models of Topic Correlations
Statistical topic models are increasingly popular tools for summarization and manifold discovery in discrete data. However, the majority of existing approaches capture no or limited correlations
Parallelized Variational EM for Latent Dirichlet Allocation: An Experimental Evaluation of Speed and Scalability
TLDR
This work builds parallel implementations of the variational EM algorithm for LDA in a multiproces- sor architecture as well as a distributed setting and indicates that while both the implementations achieve speed-ups, the distributed version achieves dramatic improvements in both speed and scalability.
Reading Tea Leaves: How Humans Interpret Topic Models
TLDR
New quantitative methods for measuring semantic meaning in inferred topics are presented, showing that they capture aspects of the model that are undetected by previous measures of model quality based on held-out likelihood.
Latent Dirichlet Allocation
Parallel Inference for Latent Dirichlet Allocation on Graphics Processing Units
TLDR
A novel data partitioning scheme is proposed that effectively reduces the memory cost of parallelizing two inference methods on GPUs for latent Dirichlet Allocation models, collapsed Gibbs sampling and collapsed variational Bayesian.
Variational Inference for the Nested Chinese Restaurant Process
TLDR
To employ variational methods, a tree-based stick-breaking construction of the nCRP mixture model is derived, and a novel variational algorithm is developed that efficiently explores a posterior over a large set of combinatorial structures.
Special db 22 and nist trec document database
  • Special db 22 and nist trec document database