HiTR: Hierarchical Topic Model Re-Estimation for Measuring Topical Diversity of Documents

  title={HiTR: Hierarchical Topic Model Re-Estimation for Measuring Topical Diversity of Documents},
  author={Hosein Azarbonyad and Mostafa Dehghani and Tom Kenter and Maarten Marx and J. Kamps and Maarten de Rijke},
  journal={IEEE Transactions on Knowledge and Data Engineering},
A high degree of topical diversity is often considered to be an important characteristic of interesting text documents. A recent proposal for measuring topical diversity identifies three distributions for assessing the diversity of documents: distributions of words within documents, words within topics, and topics within documents. Topic models play a central role in this approach and, hence, their quality is crucial to the efficacy of measuring topical diversity. The quality of topic models is… 
Deep Topic Modeling by Multilayer Bootstrap Network and Lasso
A polynomial-time deep topic model with no model and data assumptions is proposed, which first applies multilayer bootstrap network (MBN) to reduce the dimension of documents, and then uses the low-dimensional data representations or their clustering results as the target of supervised Lasso for topic word discovery.
Exploratory search over semi-structured documents
This thesis studies how metadata and structure associated with textual documents can be helpful in supporting exploratory search tasks, and first study howmetadata and structure can be exploited to manage documents and support access to semi-structured documents more effectively.
2019 Index IEEE Transactions on Knowledge and Data Engineering Vol. 31
  • Medicine
    IEEE Transactions on Knowledge and Data Engineering
  • 2020
This index covers all technical items—papers, correspondence, reviews, etc.—that appeared in this periodical during 2019, and items from previous years that were commented upon or corrected in 2019.
Enhancing Factorization Machines with Generalized Metric Learning
A Mahalanobis distance and a deep neural network methods, which can effectively model the linear and non-linear correlations between features, respectively, are presented and an efficient approach for simplifying the model functions is designed.
Being Sagacious towards Proliferated Post-Purchase Sharing: A Novel Disclosure Pattern-Wise Helpful Online Reviews Extraction Method
The proliferation of social media platforms flourishes research on helpful online reviews. Prior studies have ubiquitously taken subjective indicators to measure online review helpfulness, such as
Learning with imperfect supervision for language understanding
It is argued that even noisy and limited signals can contain a great deal of valid information that can be incorporated along with prior knowledge and biases that are encoded into learning algorithms in order to solve complex problems.
HinCTI: A Cyber Threat Intelligence Modeling and Identification System Based on Heterogeneous Information Network
This work is the first to model CTI on HIN for threat identification and propose a heterogeneous GCN-based approach for threat type identification of infrastructure nodes and is beneficial to greatly relieve security analysts from heavy analysis work and efficiently protect organizations against cyber-attacks.


Hierarchical Re-estimation of Topic Models for Measuring Topical Diversity
This work proposes a hierarchical re-estimation approach for topic models to combat generality and impurity and outperforms the state of the art on PubMed dataset which is commonly used for diversity experiments.
The dual-sparse topic model: mining focused topics and focused terms in short text
A dual-sparse topic model is proposed that addresses the sparsity in both the topic mixtures and the word usage and outperforms both classical topic models and existing sparsity-enhanced topic models.
Parsimonious Topic Models with Salient Word Discovery
This work derives a Bayesian Information Criterion (BIC) from a parsimonious topic model for text corpora, and identifies an effective sample size and corresponding penalty specific to each parameter type in this model.
Text-based measures of document diversity
Quantitative notions of diversity have been explored across a variety of disciplines ranging from conservation biology to economics. However, there has been relatively little work on measuring the
Integrating Document Clustering and Topic Modeling
A multi-grain clustering topic model (MGCTM) which integrates document clustering and topic modeling into a unified framework and jointly performs the two tasks to achieve the overall best performance is proposed.
Improving Topic Coherence with Regularized Topic Models
This work proposes two methods to regularize the learning of topic models by creating a structured prior over words that reflect broad patterns in the external data that make topic models more useful across a broader range of text data.
A biterm topic model for short texts
The approach can discover more prominent and coherent topics, and significantly outperform baseline methods on several evaluation metrics, and is found that BTM can outperform LDA even on normal texts, showing the potential generality and wider usage of the new topic model.
Exploring the Space of Topic Coherence Measures
This work is the first to propose a framework that allows to construct existing word based coherence measures as well as new ones by combining elementary components, and shows that new combinations of components outperform existing measures with respect to correlation to human ratings.
Are Topically Diverse Documents Also Interesting?
There is a relatively low correlation between interestingness and topical diversity; that there are two extreme categories of documents: highly interesting, but hardly diverse focused interesting documents and highly diverse but not interesting documents; when these two extreme types of documents are removed there is a positive correlation betweeninterestingness and diversity.
Machine Reading Tea Leaves: Automatically Evaluating Topic Coherence and Topic Model Quality
This work explores the two tasks of automatic Evaluation of single topics and automatic evaluation of whole topic models, and provides recommendations on the best strategy for performing the two task, in addition to providing an open-source toolkit for topic and topic model evaluation.