Topic Grouper: An Agglomerative Clustering Approach to Topic Modeling

@article{Pfeifer2019TopicGA,
  title={Topic Grouper: An Agglomerative Clustering Approach to Topic Modeling},
  author={Daniel Pfeifer and Jochen L. Leidner},
  journal={ArXiv},
  year={2019},
  volume={abs/1904.06483}
}
We introduce Topic Grouper as a complementary approach in the field of probabilistic topic modeling. [...] Key Method The algorithm starts with one-word topics and joins two topics at every step. It therefore generates a solution for every desired number of topics ranging between the size of the training vocabulary and one. The process represents an agglomerative clustering that corresponds to a binary tree of topics. A resulting tree may act as a containment hierarchy, typically with more general topics…Expand
4 Citations
Novel semantic tagging detection algorithms based non-negative matrix factorization
TLDR
A novel learning tagging model called semantic non-negative matrix factorization is proposed, which introduces the utilization of the semantic text representation via knowledge-based approach to extract the term-topic matrix and the topic-document matrix by semantically approach. Expand
Nobody Said it Would be Easy: A Decade of R&D Projects in Information Access from Thomson over Reuters to Refinitiv
TLDR
A critical assessment of what academia can and cannot do for industry, and what industry can do for research in terms of R&D efforts are attempted in this talk. Expand
Effective interrelation of Bayesian nonparametric document clustering and embedded-topic modeling
  • Gianni Costa, Riccardo Ortale
  • Computer Science
  • 2021
Abstract Topic modeling can be synergically interrelated with document clustering. We present an innovative unsupervised approach to the interrelationship of topic modeling with document clustering.Expand

References

SHOWING 1-10 OF 67 REFERENCES
Modeling topic hierarchies with the recursive chinese restaurant process
TLDR
This work introduces the recursive Chinese restaurant process (rCRP) and a nonparametric topic model with rCRP as a prior for discovering a hierarchical topic structure with unbounded depth and width and suggests two metrics that quantify the characteristics of a topic hierarchy to compare the discovered topic hierarchies of r CRP and nCRP. Expand
Pachinko allocation: DAG-structured mixture models of topic correlations
TLDR
Improved performance of PAM is shown in document classification, likelihood of held-out data, the ability to support finer-grained topics, and topical keyword coherence. Expand
Hierarchical Latent Tree Analysis for Topic Detection
TLDR
A new method for topic detection, where a topic is determined by identifying words that appear with high frequency in the topic and low frequency in other topics, is proposed using a hierarchy of discrete latent variables. Expand
Hierarchical Topic Models and the Nested Chinese Restaurant Process
TLDR
A Bayesian approach is taken to generate an appropriate prior via a distribution on partitions that allows arbitrarily large branching factors and readily accommodates growing data collections. Expand
TopicNets: Visual Analysis of Large Text Corpora with Topic Modeling
TLDR
A discussion of the design and implementation choices for each visual analysis technique is presented, followed by a discussion of three diverse use cases in which TopicNets enables fast discovery of information that is otherwise hard to find. Expand
Mixtures of hierarchical topics with Pachinko allocation
TLDR
H hierarchical PAM is presented---an enhancement that explicitly represents a topic hierarchy that can be seen as combining the advantages of hLDA's topical hierarchy representation with PAM's ability to mix multiple leaves of the topic hierarchy. Expand
Hierarchical Dirichlet Processes
We consider problems involving groups of data where each observation within a group is a draw from a mixture model and where it is desirable to share mixture components between groups. We assume thatExpand
Topic-weak-correlated Latent Dirichlet allocation
  • Yi-Shiuan Tan, Zhijian Ou
  • Computer Science
  • 2010 7th International Symposium on Chinese Spoken Language Processing
  • 2010
TLDR
Experimental results on both synthetic and real-world corpus show the superiority of the TWC-LDA over the basic LDA for semantically meaningful topic discovery and document classification. Expand
Finding scientific topics
  • T. Griffiths, M. Steyvers
  • Computer Science, Medicine
  • Proceedings of the National Academy of Sciences of the United States of America
  • 2004
TLDR
A generative model for documents is described, introduced by Blei, Ng, and Jordan, and a Markov chain Monte Carlo algorithm is presented for inference in this model, which is used to analyze abstracts from PNAS by using Bayesian model selection to establish the number of topics. Expand
Sharing Clusters among Related Groups: Hierarchical Dirichlet Processes
TLDR
The hierarchical Dirichlet process (HDP), a nonparametric Bayesian model for clustering problems involving multiple groups of data, is proposed and experimental results are reported showing the effective and superior performance of the HDP over previous models. Expand
...
1
2
3
4
5
...