Finding the Number of Clusters in a Dataset Using an Information Theoretic Hierarchical Algorithm

Abstract

One of the most challenging problems of clustering is detecting the exact number of clusters in a dataset. Most of the previous methods, presented to solve this problem, estimate the number of clusters with model based algorithms, which are not able to detect all types of clusters and also face a problem in detecting coupled clusters in a dataset. In this paper we propose a new method for finding the number of clusters in a dataset utilizing information theory and a top-down hierarchical clustering algorithm. The algorithm starts from a large number of clusters and reduces one cluster in any iteration and then allocates its data points to the remaining clusters. Finally, by measuring Information Potential, the exact number of clusters in a desired dataset is detected. Our method shows high capability and stability in detecting the number of clusters even in complex datasets, as it is computational efficient too. We show the effectiveness of the proposed method by experimenting on several artificial and real datasets and comparing its results with two recently developed methods for finding the number of clusters in a dataset. The comparisons show superiority of the proposed method

DOI: 10.1109/ICECS.2006.379729

5 Figures and Tables

0102030200920102011201220132014201520162017
Citations per Year

110 Citations

Semantic Scholar estimates that this publication has 110 citations based on the available data.

See our FAQ for additional information.

Cite this paper

@inproceedings{Aghagolzadeh2006FindingTN, title={Finding the Number of Clusters in a Dataset Using an Information Theoretic Hierarchical Algorithm}, author={Mehdi Aghagolzadeh and Hamid Soltanian-Zadeh and Babak Nadjar Araabi and Ali Aghagolzadeh}, booktitle={ICECS}, year={2006} }