# A Comparative Study of Efficient Initialization Methods for the K-Means Clustering Algorithm

@article{Celebi2013ACS, title={A Comparative Study of Efficient Initialization Methods for the K-Means Clustering Algorithm}, author={M. E. Celebi and Hassan A. Kingravi and Patricio A. Vela}, journal={ArXiv}, year={2013}, volume={abs/1209.1960} }

## 822 Citations

Histogram-Based Method for Effective Initialization of the K-Means Clustering Algorithm

- Computer ScienceFLAIRS Conference
- 2014

Experiments on a diverse collection of data sets from the UCI Machine Learning Repository demonstrate the superiority of the proposed linear, deterministic, and order-invariant initialization method based on multidimensional histograms.

Deterministic Initialization of the k-Means Algorithm using Hierarchical Clustering

- Computer ScienceInt. J. Pattern Recognit. Artif. Intell.
- 2012

Experiments demonstrate that Var-Part and PCA-Part are highly competitive with one of the best random initialization methods to date, i.e. k-means++, and that the proposed approach significantly improves the performance of both hierarchical methods.

Efficient Identification of Initial Clusters Centers for Partitioning Clustering Methods

- Computer Science2019 Fifth International Conference on Image Information Processing (ICIIP)
- 2019

The proposed algorithm for initializing the centers of k-means is compared with other six well know existing methods and it is found that the proposed method works better as compared to other existing methods.

Initialization of k-modes clustering for categorical data

- Computer Science2013 International Conference on Management Science and Engineering 20th Annual Conference Proceedings
- 2013

An overview of initialization methods of clustering for numerical data and categorical data respectively with an emphasis on their computational efficiency is presented and a new initialization method for categoricalData is proposed, which can obtain the good initial cluster centers using the new distance base on the RD.

A new projection-based K-Means initialization algorithm

- Computer Science2016 IEEE Chinese Guidance, Navigation and Control Conference (CGNCC)
- 2016

A projection-based K-Means initialization algorithm that first employ conventional Gaussian kernel density estimation method to find the highly density data areas in one dimension and iteratively use density estimation from the lower variance dimensions to the higher variance ones until all the dimensions are computed.

Performance Analysis of K-Means Seeding Algorithms

- Computer Science2019 IEEE International Autumn Meeting on Power, Electronics and Computing (ROPEC)
- 2019

This study evaluates three state-of-the-art initialization methods with three different quality measures, i.e., SSE, the Silhouette Coefficient, and the Adjusted Rand Index, and provides new insight into the performance of initialization methods that traditionally are left behind.

A Comprehensive Survey on Centroid Selection Strategies for Distributed K-means Clustering Algorithm

- Computer Science
- 2015

An overview of existing methods with emphasis on computational efficiency is presented and comparison of three well known linear time complexity initialization methods has been presented here.

Linear, Deterministic, and Order-Invariant Initialization Methods for the K-Means Clustering Algorithm

- Computer ScienceArXiv
- 2014

This chapter investigates the empirical performance of six linear, deterministic (non-random), and order-invariant k-means initialization methods on a large and diverse collection of data sets from the UCI Machine Learning Repository and demonstrates that two relatively unknown hierarchical initialization methods due to Su and Dy outperform the remaining four methods with respect to two objective effectiveness criteria.

An entropy-based initialization method of K-means clustering on the optimal number of clusters

- Computer ScienceNeural Comput. Appl.
- 2021

This paper has defined an entropy-based objective function for the initialization process, which is better than other existing initialization methods of K-means clustering and designed an algorithm to calculate the correct number of clusters of datasets using some cluster validity indexes.

Improving the Initial Centroids of k-means Clustering Algorithm to Generalize its Applicability

- Computer Science
- 2014

A new method to formulate the initial centroids is proposed which results in better clusters equally for uniform and non-uniform data sets.

## References

SHOWING 1-10 OF 89 REFERENCES

An initialization method for the K-Means algorithm using neighborhood model

- Computer ScienceComput. Math. Appl.
- 2009

Initialization of cluster refinement algorithms: a review and comparative study

- Computer Science2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No.04CH37541)
- 2004

A controlled benchmark identifies two distance optimization methods, namely SCS and KKZ, as complements of the k-means learning characteristics towards a better cluster separation in the output solution.

A method for initialising the K-means clustering algorithm using kd-trees

- Computer SciencePattern Recognit. Lett.
- 2007

Refining Initial Points for K-Means Clustering

- Computer ScienceICML
- 1998

A procedure for computing a refined starting condition from a given initial one that is based on an efficient technique for estimating the modes of a distribution that allows the iterative algorithm to converge to a “better” local minimum.

A Divise Initialisation Method for Clustering Algorithms

- Computer SciencePKDD
- 1999

A method for the initialisation step of clustering algorithms is presented, based on the concept of cluster as a high density region of points, which shows the good quality of the estimated centroids with respect to the random choice of points.

An Efficient k-Means Clustering Algorithm: Analysis and Implementation

- Computer ScienceIEEE Trans. Pattern Anal. Mach. Intell.
- 2002

This work presents a simple and efficient implementation of Lloyd's k-means clustering algorithm, which it calls the filtering algorithm, and establishes the practical efficiency of the algorithm's running time.

Robust partitional clustering by outlier and density insensitive seeding

- Computer SciencePattern Recognit. Lett.
- 2009

An empirical comparison of four initialization methods for the K-Means algorithm

- Computer SciencePattern Recognit. Lett.
- 1999

Careful Seeding Method based on Independent Components Analysis for k-means Clustering

- Computer Science
- 2012

This work proposes a seeding method that is based on the independent component analysis for the k-means clustering method and shows that the normalized mutual information of the proposed method is better than the normalized Mutual Information of theKKZ method, the KKz method, and the k.means++ clusteringmethod.