A Comparative Study of Efficient Initialization Methods for the K-Means Clustering Algorithm

  title={A Comparative Study of Efficient Initialization Methods for the K-Means Clustering Algorithm},
  author={M. E. Celebi and Hassan A. Kingravi and Patricio A. Vela},

Figures and Tables from this paper

Histogram-Based Method for Effective Initialization of the K-Means Clustering Algorithm
Experiments on a diverse collection of data sets from the UCI Machine Learning Repository demonstrate the superiority of the proposed linear, deterministic, and order-invariant initialization method based on multidimensional histograms.
Deterministic Initialization of the k-Means Algorithm using Hierarchical Clustering
Experiments demonstrate that Var-Part and PCA-Part are highly competitive with one of the best random initialization methods to date, i.e. k-means++, and that the proposed approach significantly improves the performance of both hierarchical methods.
Efficient Identification of Initial Clusters Centers for Partitioning Clustering Methods
  • R. Singh, D. Rajpoot
  • Computer Science
    2019 Fifth International Conference on Image Information Processing (ICIIP)
  • 2019
The proposed algorithm for initializing the centers of k-means is compared with other six well know existing methods and it is found that the proposed method works better as compared to other existing methods.
Initialization of k-modes clustering for categorical data
An overview of initialization methods of clustering for numerical data and categorical data respectively with an emphasis on their computational efficiency is presented and a new initialization method for categoricalData is proposed, which can obtain the good initial cluster centers using the new distance base on the RD.
A new projection-based K-Means initialization algorithm
A projection-based K-Means initialization algorithm that first employ conventional Gaussian kernel density estimation method to find the highly density data areas in one dimension and iteratively use density estimation from the lower variance dimensions to the higher variance ones until all the dimensions are computed.
Performance Analysis of K-Means Seeding Algorithms
This study evaluates three state-of-the-art initialization methods with three different quality measures, i.e., SSE, the Silhouette Coefficient, and the Adjusted Rand Index, and provides new insight into the performance of initialization methods that traditionally are left behind.
A Comprehensive Survey on Centroid Selection Strategies for Distributed K-means Clustering Algorithm
An overview of existing methods with emphasis on computational efficiency is presented and comparison of three well known linear time complexity initialization methods has been presented here.
Linear, Deterministic, and Order-Invariant Initialization Methods for the K-Means Clustering Algorithm
This chapter investigates the empirical performance of six linear, deterministic (non-random), and order-invariant k-means initialization methods on a large and diverse collection of data sets from the UCI Machine Learning Repository and demonstrates that two relatively unknown hierarchical initialization methods due to Su and Dy outperform the remaining four methods with respect to two objective effectiveness criteria.
An entropy-based initialization method of K-means clustering on the optimal number of clusters
This paper has defined an entropy-based objective function for the initialization process, which is better than other existing initialization methods of K-means clustering and designed an algorithm to calculate the correct number of clusters of datasets using some cluster validity indexes.
Improving the Initial Centroids of k-means Clustering Algorithm to Generalize its Applicability
A new method to formulate the initial centroids is proposed which results in better clusters equally for uniform and non-uniform data sets.


Initialization of cluster refinement algorithms: a review and comparative study
A controlled benchmark identifies two distance optimization methods, namely SCS and KKZ, as complements of the k-means learning characteristics towards a better cluster separation in the output solution.
The global k-means clustering algorithm
Refining Initial Points for K-Means Clustering
A procedure for computing a refined starting condition from a given initial one that is based on an efficient technique for estimating the modes of a distribution that allows the iterative algorithm to converge to a “better” local minimum.
An Efficient k-Means Clustering Algorithm: Analysis and Implementation
This work presents a simple and efficient implementation of Lloyd's k-means clustering algorithm, which it calls the filtering algorithm, and establishes the practical efficiency of the algorithm's running time.
A Divise Initialisation Method for Clustering Algorithms
A method for the initialisation step of clustering algorithms is presented, based on the concept of cluster as a high density region of points, which shows the good quality of the estimated centroids with respect to the random choice of points.
Careful Seeding Method based on Independent Components Analysis for k-means Clustering
This work proposes a seeding method that is based on the independent component analysis for the k-means clustering method and shows that the normalized mutual information of the proposed method is better than the normalized Mutual Information of theKKZ method, the KKz method, and the k.means++ clusteringmethod.