A Novel Benchmark K-Means Clustering on Continuous Data

Abstract

Cluster analysis is one of the prominent techniques in the field of data mining and k-means is one of the most well known popular and partitioned based clustering algorithms. K-means clustering algorithm is widely used in clustering. The performance of k-means algorithm will affect when clustering the continuous data. In this paper, a novel approach for performing k-means clustering on continuous data is proposed. It organizes all the continuous data sets in a sorted structure such that one can find all the data sets which are closest to a given centroid efficiently. The key institution behind this approach is calculating the distance from origin to each data point in the data set. The data sets are portioned into k-equal number of cluster with initial centroids and these are updated all at a time with closest one according to newly calculated distances from the data set. The experimental results demonstrate that proposed approach can improves the computational speed of the direct k-means algorithm in the total number of distance calculations and the overall time of computations particularly in handling continuous data.

1 Figure or Table

Cite this paper

@inproceedings{Prasanna2011ANB, title={A Novel Benchmark K-Means Clustering on Continuous Data}, author={K. Prasanna and Surya Narayana}, year={2011} }