An Aposteriorical Clusterability Criterion for k-Means++ and Simplicity of Clustering
@article{Klopotek2017AnAC, title={An Aposteriorical Clusterability Criterion for k-Means++ and Simplicity of Clustering}, author={Mieczyslaw Alojzy Klopotek}, journal={SN Computer Science}, year={2017}, volume={1} }
In this paper, the notion of a well-clusterable data set is defined combining the point of view of the objective of k-means clustering algorithm (minimizing the centric spread of data elements) and common sense (clusters shall be separated by gaps). Conditions are identified under which the optimum of k-means objective coincides with a clustering under which the data is separated by predefined gaps. Two cases are investigated: when the whole clusters are separated by some gap and when only the…
5 Citations
Performance Comparison of K-Means and DBScan Algorithms for Text Clustering Product Reviews
- Computer ScienceSinkrOn
- 2022
The purpose of this study was to compare the accuracy performance of the K-Means and DBScan algorithms in clustering product reviews, and concluded that, in the review clustering of Cetaphil Facial Wash products, DBScan has 99.80% accuracy.
On the Discrepancy Between Kleinberg's Clustering Axioms and k-Means Clustering Algorithm Behavior
- Computer ScienceMachine Learning
- 2023
This paper performs an investigation of Kleinberg’s axioms (from both an intuitive and formal standpoint) as they relate to the well-known k-mean clustering method, and shows that these variations of consistency are satisfied by k-means.
Research on Accurate Location of Line Loss Anomaly in Substation Area Based on Data Driven
- Computer ScienceCommunications in Computer and Information Science
- 2021
A data mining-based method for precise positioning method of users associated with abnormal line loss that has better performance in clustering effectiveness, time consumption for calculation and identification accuracy is proposed.
High-Dimensional Wide Gap k-Means Versus Clustering Axioms
- Computer Science, EconomicsArXiv
- 2022
This work makes an attempt to handle the issue of Kleinberg’s axioms for distance based clustering by embedding in high-dimensional space and granting wide gaps between clusters.
References
SHOWING 1-10 OF 28 REFERENCES
Computational Feasibility of Clustering under Clusterability Assumptions
- Computer ScienceArXiv
- 2015
This paper provides a survey of recent papers along this line of research and a critical evaluation of their results, concluding that that CDNM thesis is still far from being formally substantiated.
Clusterability Detection and Initial Seed Selection in Large Data Sets
- Computer Science
- 1999
A graphbased system for detecting clusterability and generating seed information including an estimate of the value of k { the number of clusters in the data set, an input parameter to many distance-based clustering methods.
Clustering with Spectral Norm and the k-Means Algorithm
- Computer Science, Mathematics2010 IEEE 51st Annual Symposium on Foundations of Computer Science
- 2010
This paper shows that a simple clustering algorithm works without assuming any generative (probabilistic) model, and proves some new results for generative models - e.g., it can cluster all but a small fraction of points only assuming a bound on the variance.
K-Harmonic Means - A Data Clustering Algorithm
- Computer Science
- 1999
KHM is a center-based clustering algorithm which uses the Harmonic Averages of the distances from each data point to the centers as components to its performance function and it is demonstrated that K-Harmonic Means is essentially insensitive to the initialization of the centers.
On the Local Structure of Stable Clustering Instances
- Computer Science2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS)
- 2017
It is obtained that the widely-used Local Search algorithm has strong performance guarantees for both the tasks of recovering the underlying optimal clustering and obtaining a clustering of small cost.
Clusterability: A Theoretical Study
- Computer ScienceAISTATS
- 2009
This work addresses measures of the clusterability of data sets with generality, aiming for conclusions that apply regardless of any particular clustering algorithm or any specic data generation model, as well as proposing a new notion of data clusterability.
An Effective and Efficient Approach for Clusterability Evaluation
- Computer ScienceArXiv
- 2016
A novel approach to clusterability evaluation that is both computationally efficient and successfully captures the structure of real data is proposed, demonstrating the success of this approach as the first practical notion of clusterability.
Power k-Means Clustering
- Computer ScienceICML
- 2019
This paper explores an alternative to Lloyd’s algorithm for kmeans clustering that retains its simplicity and mitigates its tendency to get trapped by local minima, and embeds the k-means problem in a continuous class of similar, better behaved problems with fewerLocal minima.
The Effectiveness of Lloyd-Type Methods for the k-Means Problem
- Computer Science2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06)
- 2006
This work investigates variants of Lloyd's heuristic for clustering high dimensional data in an attempt to explain its popularity (a half century after its introduction) among practitioners, and proposes and justifies a clusterability criterion for data sets.
Stability Yields a PTAS for k-Median and k-Means Clustering
- Computer Science, Mathematics2010 IEEE 51st Annual Symposium on Foundations of Computer Science
- 2010
Improvements are made to the distance of the clustering found to the target from $O(\delta)$ to $\delta$ when all target clusters are large, and for $k-median the authors improve the ``largeness'' condition needed in the work of Balcan et al. to get exactly $delta-close from O(delta n) to $\Delta n$.