A phrase mining framework for recursive construction of a topical hierarchy

@article{Wang2013APM,
  title={A phrase mining framework for recursive construction of a topical hierarchy},
  author={Chi Wang and Marina Danilevsky and Nihit Desai and Yinan Zhang and Phuong Nguyen and Thrivikrama Taula and Jiawei Han},
  journal={Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining},
  year={2013}
}
  • Chi WangMarina Danilevsky Jiawei Han
  • Published 11 August 2013
  • Computer Science
  • Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
A high quality hierarchical organization of the concepts in a dataset at different levels of granularity has many valuable applications such as search, summarization, and content browsing. In this paper we propose an algorithm for recursively constructing a hierarchy of topics from a collection of content-representative documents. We characterize each topic in the hierarchy by an integrated ranked list of mixed-length phrases. Our mining framework is based on a phrase-centric view for… 

Figures and Tables from this paper

Scalable and Robust Construction of Topical Hierarchies

A scalable and robust algorithm is proposed for constructing a hierarchy of topics from a text collection based on a tensor orthogonal decomposition technique, which reduces the time of construction by several orders of magnitude and renders it possible for users to interactively revise the hierarchy.

CITPM: A Cluster-Based Iterative Topical Phrase Mining Framework

A novel framework CITPM for topical phrase mining is presented, which views a corpus as a mixture of clusters (domains), and each cluster is characterized by documents sharing similar topical distributions.

Constructing topical hierarchies in heterogeneous information networks

This work presents an algorithm for recursively constructing multi-typed topical hierarchies by a newly designed clustering and ranking algorithm for heterogeneous network data, as well as mining and ranking topical patterns of different types.

Content coverage maximization on word networks for hierarchical topic summarization

A new approach of text modeling via network analysis is proposed, and a simple method based on the influence analysis is effective, compared with existing generative topic modeling and random walk based ranking.

Hierarchical topic map generation for exploratory browsing

This thesis proposes a novel model for automatically generate topic map for a document corpus with no supervision which will help the user to efficiently navigate through the corpus space and finally land upon the desired document.

An Efficient Method for High Quality and Cohesive Topical Phrase Mining

This framework integrates a quality guaranteed phrase mining method, a novel topic model incorporating the constraint of phrases, and a novel document clustering method into an iterative framework to improve both phrase quality and topical cohesion.

TaxoGen: Unsupervised Topic Taxonomy Construction by Adaptive Term Embedding and Clustering

The method, TaxoGen, uses term embeddings and hierarchical clustering to construct a topic taxonomy in a recursive fashion and consists of an adaptive spherical clustering module for allocating terms to proper levels when splitting a coarse topic into fine-grained ones.

TaxoGen: Constructing Topical Concept Taxonomy by Adaptive Term Embedding and Clustering

The method, TaxoGen, uses term embeddings and hierarchical clustering to construct a topic taxonomy in a recursive fashion and consists of an adaptive spherical clustering module for allocating terms to proper levels when splitting a coarse topic into fine-grained ones.

Constructing Topic Hierarchies from Social Media Data

This paper proposes an approach to automatically construct topic hierarchies from microblog data in a bottom up manner and detects topics first and then builds the topic structure based on a tree combination method.

Subtopic Ranking Based on Block-Level Document Analysis

This work proposes methods for ranking subtopics of a keyword query that generated rankings statistically significantly better than the query completion snapshots by major commercial search engines.
...

References

SHOWING 1-10 OF 36 REFERENCES

Discovering and Comparing Topic Hierarchies

The goal is to automatically create domain specific hierarchies that can be used for browsing a document set and locating relevant documents and shows that subsumption hierarchies divide documents into smaller groups, allowing one to find all relevant documents without looking at as many non-relevant documents.

A practical web-based approach to generating topic hierarchy for text segments

This work investigates the possibilities of using highly ranked search-result snippets to enrich the representation of text segments and proposes a hierarchical clustering algorithm, which tries to produce a more natural and comprehensive hierarchy.

Hierarchical Topic Models and the Nested Chinese Restaurant Process

A Bayesian approach is taken to generate an appropriate prior via a distribution on partitions that allows arbitrarily large branching factors and readily accommodates growing data collections.

Automatic Keyphrase Extraction via Topic Decomposition

A Topical PageRank (TPR) is built on word graph to measure word importance with respect to different topics and shows that TPR outperforms state-of-the-art keyphrase extraction methods on two datasets under various evaluation metrics.

Topical N-Grams: Phrase and Topic Discovery, with an Application to Information Retrieval

Most topic models, such as latent Dirichlet allocation, rely on the bag-of-words assumption. However, word order and phrases are often critical to capturing the meaning of text in many text mining

A Phrase-Discovering Topic Model Using Hierarchical Pitman-Yor Processes

This article presents a hierarchical generative probabilistic model of topical phrases that simultaneously infers the location, length, and topic of phrases within a corpus and relaxes the bag-of-words assumption within phrases by using a hierarchy of Pitman-Yor processes.

A Graph-Based Algorithm for Inducing Lexical Taxonomies from Scratch

A graph-based approach aimed at learning a lexical taxonomy automatically starting from a domain corpus and the Web, which results in a very dense, cyclic and possibly disconnected hypernym graph that induces a taxonomy from the graph.

Visualizing Topics with Multi-Word Expressions

A new method for visualizing topics, the distributions over terms that are automatically extracted from large text corpora using latent variable models, based on a language model of arbitrary length expressions, which outperforms the more standard use of $\chi^2$ and likelihood ratio tests.

Extracting key terms from noisy and multitheme documents

Evaluations of the method show that it outperforms existing methods producing key terms with higher precision and recall, and appears to be substantially more effective on noisy and multi-theme documents than existing methods.

Automatic labeling hierarchical topics

This paper proposes two effective algorithms that automatically assign concise labels to each topic in a hierarchy by exploiting sibling and parent-child relations among topics and shows that the inter-topic relation is effective in boosting topic labeling accuracy.