# An Aposteriorical Clusterability Criterion for k-Means++ and Simplicity of Clustering

@article{Klopotek2017AnAC,
title={An Aposteriorical Clusterability Criterion for k-Means++ and Simplicity of Clustering},
author={Mieczyslaw Alojzy Klopotek},
journal={SN Computer Science},
year={2017},
volume={1}
}
• M. Klopotek
• Published 24 April 2017
• Computer Science
• SN Computer Science
In this paper, the notion of a well-clusterable data set is defined combining the point of view of the objective of k-means clustering algorithm (minimizing the centric spread of data elements) and common sense (clusters shall be separated by gaps). Conditions are identified under which the optimum of k-means objective coincides with a clustering under which the data is separated by predefined gaps. Two cases are investigated: when the whole clusters are separated by some gap and when only the…
5 Citations
• Computer Science
SinkrOn
• 2022
The purpose of this study was to compare the accuracy performance of the K-Means and DBScan algorithms in clustering product reviews, and concluded that, in the review clustering of Cetaphil Facial Wash products, DBScan has 99.80% accuracy.
• Computer Science
Machine Learning
• 2023
This paper performs an investigation of Kleinberg’s axioms (from both an intuitive and formal standpoint) as they relate to the well-known k-mean clustering method, and shows that these variations of consistency are satisfied by k-means.
• Computer Science
Communications in Computer and Information Science
• 2021
A data mining-based method for precise positioning method of users associated with abnormal line loss that has better performance in clustering effectiveness, time consumption for calculation and identification accuracy is proposed.
This work makes an attempt to handle the issue of Kleinberg’s axioms for distance based clustering by embedding in high-dimensional space and granting wide gaps between clusters.

## References

SHOWING 1-10 OF 28 REFERENCES

This paper provides a survey of recent papers along this line of research and a critical evaluation of their results, concluding that that CDNM thesis is still far from being formally substantiated.
• Computer Science
• 1999
A graphbased system for detecting clusterability and generating seed information including an estimate of the value of k { the number of clusters in the data set, an input parameter to many distance-based clustering methods.
• Computer Science, Mathematics
2010 IEEE 51st Annual Symposium on Foundations of Computer Science
• 2010
This paper shows that a simple clustering algorithm works without assuming any generative (probabilistic) model, and proves some new results for generative models - e.g., it can cluster all but a small fraction of points only assuming a bound on the variance.
• Computer Science
• 1999
KHM is a center-based clustering algorithm which uses the Harmonic Averages of the distances from each data point to the centers as components to its performance function and it is demonstrated that K-Harmonic Means is essentially insensitive to the initialization of the centers.
• Computer Science
2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS)
• 2017
It is obtained that the widely-used Local Search algorithm has strong performance guarantees for both the tasks of recovering the underlying optimal clustering and obtaining a clustering of small cost.
• Computer Science
AISTATS
• 2009
This work addresses measures of the clusterability of data sets with generality, aiming for conclusions that apply regardless of any particular clustering algorithm or any specic data generation model, as well as proposing a new notion of data clusterability.
• Computer Science
ArXiv
• 2016
A novel approach to clusterability evaluation that is both computationally efficient and successfully captures the structure of real data is proposed, demonstrating the success of this approach as the first practical notion of clusterability.
• Computer Science
ICML
• 2019
This paper explores an alternative to Lloyd’s algorithm for kmeans clustering that retains its simplicity and mitigates its tendency to get trapped by local minima, and embeds the k-means problem in a continuous class of similar, better behaved problems with fewerLocal minima.
• Computer Science
2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06)
• 2006
This work investigates variants of Lloyd's heuristic for clustering high dimensional data in an attempt to explain its popularity (a half century after its introduction) among practitioners, and proposes and justifies a clusterability criterion for data sets.
• Computer Science, Mathematics
2010 IEEE 51st Annual Symposium on Foundations of Computer Science
• 2010
Improvements are made to the distance of the clustering found to the target from $O(\delta)$ to $\delta$ when all target clusters are large, and for $k-median the authors improve the largeness'' condition needed in the work of Balcan et al. to get exactly$delta-close from O(delta n) to $\Delta n$.