Hierarchical Bayesian Clustering for Automatic Text Classification

Abstract

Text classification, the grouping of texts into several clusters, has been used as a means of improving both the efficiency and the effectiveDess of text retrieval/categorization In this paper we propose a hierarchical clustering algor i thm that constructs a Bet of clusters having the maximum Bayesian posterior probability, the probability that the given texts are classified into clusters We call the algorithm Hierarchical Bayesian Clustering (HBC) The advantages of HBC are experimentally verified from several viewpoints (1) HBC can re-construct the original clusters more accurately than do other non probabilistic algorithms (2) When a probabilistic text categorization is extended to a cluster-based one, the use of HBC offers better performance than does the use of non probabilistic algorithms

Extracted Key Phrases

2 Figures and Tables

Statistics

051015'96'98'00'02'04'06'08'10'12'14'16
Citations per Year

fewer than 50 Citations

Semantic Scholar estimates that this publication has 50 citations based on the available data.

See our FAQ for additional information.

Cite this paper

@inproceedings{Iwayama1995HierarchicalBC, title={Hierarchical Bayesian Clustering for Automatic Text Classification}, author={Makoto Iwayama and Takenobu Tokunaga}, booktitle={IJCAI}, year={1995} }