How slow is the k-means method?

  title={How slow is the k-means method?},
  author={David Arthur and Sergei Vassilvitskii},
  booktitle={SCG '06},
The <b>k-means</b> method is an old but popular clustering algorithm known for its observed speed and its simplicity. Until recently, however, no meaningful theoretical bounds were known on its running time. In this paper, we demonstrate that the worst-case running time of <b>k-means</b> is <i>superpolynomial</i> by improving the best known lower bound from Ω<i>(n)</i> iterations to 2<sup>Ω(√<i>n</i>)</sup>. 
k-Means Has Polynomial Smoothed Complexity
The smoothed running time of the k-means method is settled, showing that the smoothed number of iterations is bounded by a polynomial in n and 1/sigma, where sigma is the standard deviation of the Gaussian perturbations.
Smoothed Analysis of the k-Means Method
The smoothed running time of the k-means method is settled, showing that the smoothed number of iterations is bounded by a polynomial in n and 1/σ, where σ is the standard deviation of the Gaussian perturbations.
k-means requires exponentially many iterations even in the plane
This work proves the existence of super-polynomial lower bounds for any d≥ 2 and improves the lower bound, by presenting a simple construction in the plane that leads to the exponential lower bound 2Ω(n).
The Complexity of the k-means Method
It is proved that the k-means method can implicitly solve PSPACE-complete problems, providing a complexity-theoretic explanation for its worst-case running time.
Theoretical Analysis of the k-Means Algorithm - A Survey
This paper surveys the recent results in this direction as well as several extension of the basic k-means method that can be used to improve the algorithm.
On the Lower Bound of Local Optimums in K-Means Algorithm
This paper proposes an efficient method to compute a lower bound on the cost of the local optimum from the current center set and shows that this method can greatly prune the unnecessary iterations and improve the efficiency of the algorithm in most of the data sets, especially with high dimensionality and large k.
k-means++: the advantages of careful seeding
By augmenting k-means with a very simple, randomized seeding technique, this work obtains an algorithm that is Θ(logk)-competitive with the optimal clustering.
A Tight Lower Bound Instance for k-means++ in Constant Dimension
The k-means++ seeding algorithm is one of the most popular algorithms that is used for finding the initial k centers when using the k-means heuristic. The algorithm is a simple sampling procedure and
Tight lower bound instances for k-means++ in two dimensions
Notice of Violation of IEEE Publication PrinciplesK-means versus k-means ++ clustering technique
By augmenting k-means with a very simple, randomized seeding technique, this paper obtains an algorithm that is (log k)-competitive with the optimal clustering.


How Fast Is the k-Means Method?
This is the first construction showing that the k-means heuristic requires more than a polylogarithmic number of iterations, and the spread of the point set in this construction is polynomial.
Worst-case and Smoothed Analysis of the ICP Algorithm, with an Application to the k-means Method
A worst-case lower bound and a smoothed upper bound on the number of iterations performed by the iterative closest point (ICP) algorithm are shown and the smoothed complexity of ICP is polynomial, independent of the dimensionality of the data.
A local search approximation algorithm for k-means clustering
This work considers the question of whether there exists a simple and practical approximation algorithm for k-means clustering, and presents a local improvement heuristic based on swapping centers in and out that yields a (9+ε)-approximation algorithm.
How Fast Is k-Means?
The k-means algorithm is probably the most widely used clustering heuristic, and has the reputation of being fast. How fast is it exactly? Almost no non-trivial time bounds are known for it.
Applications of weighted Voronoi diagrams and randomization to variance-based k-clustering: (extended abstract)
The optimum solution to the k-clustering problem is characterized by the ordinary Euclidean Voronoi diagram and the weighted Vor onoi diagram with both multiplicative and additive weights.
k-means projective clustering
An extension of the k-means clustering algorithm for projective clustering in arbitrary subspaces is presented, taking into account the inherent trade-off between the dimension of a subspace and the induced clustering error.
Smoothed analysis of algorithms: Why the simplex algorithm usually takes polynomial time
The smoothed analysis of algorithms is introduced, which continuously interpolates between the worst-case and average-case analyses of algorithms, and it is shown that the simplex algorithm has smoothed complexity polynomial in the input size and the standard deviation of Gaussian perturbations.
A fast hybrid k-means level set algorithm for segmentation
  • F. Gibou
  • Computer Science, Mathematics
  • 2005
The proposed method retains spatial coherence on initial data characteristic of curve evolution techniques, as well as the balance between a pixel/voxel’s proximity to the curve and its intention to cross over the curve from the underlying energy.
Large-scale clustering of cDNA-fingerprinting data.
A pairwise similarity measure between two p-dimensional data points, x and y, is introduced that is superior to commonly used metric distances, for example, Euclidean distance and a modified version of mutual information is introduced as a novel method for validating clustering results when the true clustering is known.
Least squares quantization in PCM
  • S. P. Lloyd
  • Computer Science
    IEEE Trans. Inf. Theory
  • 1982
The corresponding result for any finite number of quanta is derived; that is, necessary conditions are found that the quanta and associated quantization intervals of an optimum finite quantization scheme must satisfy.