An Aposteriorical Clusterability Criterion for k-Means++ and Simplicity of Clustering

  title={An Aposteriorical Clusterability Criterion for k-Means++ and Simplicity of Clustering},
  author={Mieczyslaw Alojzy Klopotek},
  journal={SN Computer Science},
  • M. Klopotek
  • Published 24 April 2017
  • Computer Science
  • SN Computer Science
In this paper, the notion of a well-clusterable data set is defined combining the point of view of the objective of k-means clustering algorithm (minimizing the centric spread of data elements) and common sense (clusters shall be separated by gaps). Conditions are identified under which the optimum of k-means objective coincides with a clustering under which the data is separated by predefined gaps. Two cases are investigated: when the whole clusters are separated by some gap and when only the… 

Performance Comparison of K-Means and DBScan Algorithms for Text Clustering Product Reviews

The purpose of this study was to compare the accuracy performance of the K-Means and DBScan algorithms in clustering product reviews, and concluded that, in the review clustering of Cetaphil Facial Wash products, DBScan has 99.80% accuracy.

On the Discrepancy Between Kleinberg's Clustering Axioms and k-Means Clustering Algorithm Behavior

This paper performs an investigation of Kleinberg’s axioms (from both an intuitive and formal standpoint) as they relate to the well-known k-mean clustering method, and shows that these variations of consistency are satisfied by k-means.

Research on Accurate Location of Line Loss Anomaly in Substation Area Based on Data Driven

A data mining-based method for precise positioning method of users associated with abnormal line loss that has better performance in clustering effectiveness, time consumption for calculation and identification accuracy is proposed.

High-Dimensional Wide Gap k-Means Versus Clustering Axioms

This work makes an attempt to handle the issue of Kleinberg’s axioms for distance based clustering by embedding in high-dimensional space and granting wide gaps between clusters.



Computational Feasibility of Clustering under Clusterability Assumptions

This paper provides a survey of recent papers along this line of research and a critical evaluation of their results, concluding that that CDNM thesis is still far from being formally substantiated.

Clusterability Detection and Initial Seed Selection in Large Data Sets

A graphbased system for detecting clusterability and generating seed information including an estimate of the value of k { the number of clusters in the data set, an input parameter to many distance-based clustering methods.

Clustering with Spectral Norm and the k-Means Algorithm

  • Amit KumarR. Kannan
  • Computer Science, Mathematics
    2010 IEEE 51st Annual Symposium on Foundations of Computer Science
  • 2010
This paper shows that a simple clustering algorithm works without assuming any generative (probabilistic) model, and proves some new results for generative models - e.g., it can cluster all but a small fraction of points only assuming a bound on the variance.

K-Harmonic Means - A Data Clustering Algorithm

KHM is a center-based clustering algorithm which uses the Harmonic Averages of the distances from each data point to the centers as components to its performance function and it is demonstrated that K-Harmonic Means is essentially insensitive to the initialization of the centers.

On the Local Structure of Stable Clustering Instances

It is obtained that the widely-used Local Search algorithm has strong performance guarantees for both the tasks of recovering the underlying optimal clustering and obtaining a clustering of small cost.

Clusterability: A Theoretical Study

This work addresses measures of the clusterability of data sets with generality, aiming for conclusions that apply regardless of any particular clustering algorithm or any specic data generation model, as well as proposing a new notion of data clusterability.

An Effective and Efficient Approach for Clusterability Evaluation

A novel approach to clusterability evaluation that is both computationally efficient and successfully captures the structure of real data is proposed, demonstrating the success of this approach as the first practical notion of clusterability.

Power k-Means Clustering

This paper explores an alternative to Lloyd’s algorithm for kmeans clustering that retains its simplicity and mitigates its tendency to get trapped by local minima, and embeds the k-means problem in a continuous class of similar, better behaved problems with fewerLocal minima.

The Effectiveness of Lloyd-Type Methods for the k-Means Problem

This work investigates variants of Lloyd's heuristic for clustering high dimensional data in an attempt to explain its popularity (a half century after its introduction) among practitioners, and proposes and justifies a clusterability criterion for data sets.

Stability Yields a PTAS for k-Median and k-Means Clustering

Improvements are made to the distance of the clustering found to the target from $O(\delta)$ to $\delta$ when all target clusters are large, and for $k-median the authors improve the ``largeness'' condition needed in the work of Balcan et al. to get exactly $delta-close from O(delta n) to $\Delta n$.