# A phrase mining framework for recursive construction of a topical hierarchy

• Published 11 August 2013
• Computer Science
• Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
A high quality hierarchical organization of the concepts in a dataset at different levels of granularity has many valuable applications such as search, summarization, and content browsing. In this paper we propose an algorithm for recursively constructing a hierarchy of topics from a collection of content-representative documents. We characterize each topic in the hierarchy by an integrated ranked list of mixed-length phrases. Our mining framework is based on a phrase-centric view for…

## Figures and Tables from this paper

### Scalable and Robust Construction of Topical Hierarchies

• Computer Science
ArXiv
• 2014
A scalable and robust algorithm is proposed for constructing a hierarchy of topics from a text collection based on a tensor orthogonal decomposition technique, which reduces the time of construction by several orders of magnitude and renders it possible for users to interactively revise the hierarchy.

### CITPM: A Cluster-Based Iterative Topical Phrase Mining Framework

• Computer Science
DASFAA
• 2016
A novel framework CITPM for topical phrase mining is presented, which views a corpus as a mixture of clusters (domains), and each cluster is characterized by documents sharing similar topical distributions.

### Constructing topical hierarchies in heterogeneous information networks

• Computer Science
2013 IEEE 13th International Conference on Data Mining
• 2013
This work presents an algorithm for recursively constructing multi-typed topical hierarchies by a newly designed clustering and ranking algorithm for heterogeneous network data, as well as mining and ranking topical patterns of different types.

### Content coverage maximization on word networks for hierarchical topic summarization

• Computer Science
CIKM
• 2013
A new approach of text modeling via network analysis is proposed, and a simple method based on the influence analysis is effective, compared with existing generative topic modeling and random walk based ranking.

### Hierarchical topic map generation for exploratory browsing

This thesis proposes a novel model for automatically generate topic map for a document corpus with no supervision which will help the user to efficiently navigate through the corpus space and finally land upon the desired document.

### An Efficient Method for High Quality and Cohesive Topical Phrase Mining

• Computer Science
IEEE Transactions on Knowledge and Data Engineering
• 2019
This framework integrates a quality guaranteed phrase mining method, a novel topic model incorporating the constraint of phrases, and a novel document clustering method into an iterative framework to improve both phrase quality and topical cohesion.

### TaxoGen: Unsupervised Topic Taxonomy Construction by Adaptive Term Embedding and Clustering

• Computer Science
KDD
• 2018
The method, TaxoGen, uses term embeddings and hierarchical clustering to construct a topic taxonomy in a recursive fashion and consists of an adaptive spherical clustering module for allocating terms to proper levels when splitting a coarse topic into fine-grained ones.

### TaxoGen: Constructing Topical Concept Taxonomy by Adaptive Term Embedding and Clustering

• Computer Science
KDD 2018
• 2018
The method, TaxoGen, uses term embeddings and hierarchical clustering to construct a topic taxonomy in a recursive fashion and consists of an adaptive spherical clustering module for allocating terms to proper levels when splitting a coarse topic into fine-grained ones.

### Constructing Topic Hierarchies from Social Media Data

• Computer Science
2015 IEEE International Conference on Data Mining Workshop (ICDMW)
• 2015
This paper proposes an approach to automatically construct topic hierarchies from microblog data in a bottom up manner and detects topics first and then builds the topic structure based on a tree combination method.

### Subtopic Ranking Based on Block-Level Document Analysis

• Computer Science
WEBIST
• 2016
This work proposes methods for ranking subtopics of a keyword query that generated rankings statistically significantly better than the query completion snapshots by major commercial search engines.

## References

### Discovering and Comparing Topic Hierarchies

• Computer Science
RIAO
• 2000
The goal is to automatically create domain specific hierarchies that can be used for browsing a document set and locating relevant documents and shows that subsumption hierarchies divide documents into smaller groups, allowing one to find all relevant documents without looking at as many non-relevant documents.

### A practical web-based approach to generating topic hierarchy for text segments

• Computer Science
CIKM '04
• 2004
This work investigates the possibilities of using highly ranked search-result snippets to enrich the representation of text segments and proposes a hierarchical clustering algorithm, which tries to produce a more natural and comprehensive hierarchy.

### Hierarchical Topic Models and the Nested Chinese Restaurant Process

• Computer Science
NIPS
• 2003
A Bayesian approach is taken to generate an appropriate prior via a distribution on partitions that allows arbitrarily large branching factors and readily accommodates growing data collections.

### Automatic Keyphrase Extraction via Topic Decomposition

• Computer Science
EMNLP
• 2010
A Topical PageRank (TPR) is built on word graph to measure word importance with respect to different topics and shows that TPR outperforms state-of-the-art keyphrase extraction methods on two datasets under various evaluation metrics.

### Topical N-Grams: Phrase and Topic Discovery, with an Application to Information Retrieval

• Computer Science
Seventh IEEE International Conference on Data Mining (ICDM 2007)
• 2007
Most topic models, such as latent Dirichlet allocation, rely on the bag-of-words assumption. However, word order and phrases are often critical to capturing the meaning of text in many text mining

### A Phrase-Discovering Topic Model Using Hierarchical Pitman-Yor Processes

• Computer Science
EMNLP
• 2012
This article presents a hierarchical generative probabilistic model of topical phrases that simultaneously infers the location, length, and topic of phrases within a corpus and relaxes the bag-of-words assumption within phrases by using a hierarchy of Pitman-Yor processes.

### A Graph-Based Algorithm for Inducing Lexical Taxonomies from Scratch

• Computer Science
IJCAI
• 2011
A graph-based approach aimed at learning a lexical taxonomy automatically starting from a domain corpus and the Web, which results in a very dense, cyclic and possibly disconnected hypernym graph that induces a taxonomy from the graph.

### Visualizing Topics with Multi-Word Expressions

• Computer Science
• 2009
A new method for visualizing topics, the distributions over terms that are automatically extracted from large text corpora using latent variable models, based on a language model of arbitrary length expressions, which outperforms the more standard use of $\chi^2$ and likelihood ratio tests.

### Extracting key terms from noisy and multitheme documents

• Computer Science
WWW '09
• 2009
Evaluations of the method show that it outperforms existing methods producing key terms with higher precision and recall, and appears to be substantially more effective on noisy and multi-theme documents than existing methods.

### Automatic labeling hierarchical topics

• Computer Science
CIKM
• 2012
This paper proposes two effective algorithms that automatically assign concise labels to each topic in a hierarchy by exploiting sibling and parent-child relations among topics and shows that the inter-topic relation is effective in boosting topic labeling accuracy.