# How slow is the k-means method?

@inproceedings{Arthur2006HowSI, title={How slow is the k-means method?}, author={David Arthur and Sergei Vassilvitskii}, booktitle={SCG '06}, year={2006} }

The <b>k-means</b> method is an old but popular clustering algorithm known for its observed speed and its simplicity. Until recently, however, no meaningful theoretical bounds were known on its running time. In this paper, we demonstrate that the worst-case running time of <b>k-means</b> is <i>superpolynomial</i> by improving the best known lower bound from Ω<i>(n)</i> iterations to 2<sup>Ω(√<i>n</i>)</sup>.

## Figures and Tables from this paper

## 348 Citations

k-Means Has Polynomial Smoothed Complexity

- Computer Science2009 50th Annual IEEE Symposium on Foundations of Computer Science
- 2009

The smoothed running time of the k-means method is settled, showing that the smoothed number of iterations is bounded by a polynomial in n and 1/sigma, where sigma is the standard deviation of the Gaussian perturbations.

Smoothed Analysis of the k-Means Method

- Computer ScienceJACM
- 2011

The smoothed running time of the k-means method is settled, showing that the smoothed number of iterations is bounded by a polynomial in n and 1/σ, where σ is the standard deviation of the Gaussian perturbations.

k-means requires exponentially many iterations even in the plane

- Computer Science, MathematicsSCG '09
- 2009

This work proves the existence of super-polynomial lower bounds for any d≥ 2 and improves the lower bound, by presenting a simple construction in the plane that leads to the exponential lower bound 2Ω(n).

The Complexity of the k-means Method

- Computer ScienceESA
- 2016

It is proved that the k-means method can implicitly solve PSPACE-complete problems, providing a complexity-theoretic explanation for its worst-case running time.

Theoretical Analysis of the k-Means Algorithm - A Survey

- Computer ScienceAlgorithm Engineering
- 2016

This paper surveys the recent results in this direction as well as several extension of the basic k-means method that can be used to improve the algorithm.

On the Lower Bound of Local Optimums in K-Means Algorithm

- Computer ScienceSixth International Conference on Data Mining (ICDM'06)
- 2006

This paper proposes an efficient method to compute a lower bound on the cost of the local optimum from the current center set and shows that this method can greatly prune the unnecessary iterations and improve the efficiency of the algorithm in most of the data sets, especially with high dimensionality and large k.

k-means++: the advantages of careful seeding

- Computer ScienceSODA '07
- 2007

By augmenting k-means with a very simple, randomized seeding technique, this work obtains an algorithm that is Θ(logk)-competitive with the optimal clustering.

A Tight Lower Bound Instance for k-means++ in Constant Dimension

- Computer ScienceTAMC
- 2014

The k-means++ seeding algorithm is one of the most popular algorithms that is used for finding the initial k centers when using the k-means heuristic. The algorithm is a simple sampling procedure and…

Tight lower bound instances for k-means++ in two dimensions

- Computer Science, MathematicsTheor. Comput. Sci.
- 2016

Notice of Violation of IEEE Publication PrinciplesK-means versus k-means ++ clustering technique

- Computer Science2012 Students Conference on Engineering and Systems
- 2012

By augmenting k-means with a very simple, randomized seeding technique, this paper obtains an algorithm that is (log k)-competitive with the optimal clustering.

## References

SHOWING 1-10 OF 17 REFERENCES

How Fast Is the k-Means Method?

- Computer ScienceSODA '05
- 2005

This is the first construction showing that the k-means heuristic requires more than a polylogarithmic number of iterations, and the spread of the point set in this construction is polynomial.

Worst-case and Smoothed Analysis of the ICP Algorithm, with an Application to the k-means Method

- Computer Science2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06)
- 2006

A worst-case lower bound and a smoothed upper bound on the number of iterations performed by the iterative closest point (ICP) algorithm are shown and the smoothed complexity of ICP is polynomial, independent of the dimensionality of the data.

A local search approximation algorithm for k-means clustering

- Computer ScienceSCG '02
- 2002

This work considers the question of whether there exists a simple and practical approximation algorithm for k-means clustering, and presents a local improvement heuristic based on swapping centers in and out that yields a (9+ε)-approximation algorithm.

How Fast Is k-Means?

- Computer ScienceCOLT
- 2003

The k-means algorithm is probably the most widely used clustering heuristic, and has the reputation of being fast. How fast is it exactly? Almost no non-trivial time bounds are known for it.

Applications of weighted Voronoi diagrams and randomization to variance-based k-clustering: (extended abstract)

- Computer ScienceSCG '94
- 1994

The optimum solution to the k-clustering problem is characterized by the ordinary Euclidean Voronoi diagram and the weighted Vor onoi diagram with both multiplicative and additive weights.

k-means projective clustering

- Computer SciencePODS '04
- 2004

An extension of the k-means clustering algorithm for projective clustering in arbitrary subspaces is presented, taking into account the inherent trade-off between the dimension of a subspace and the induced clustering error.

Smoothed analysis of algorithms: Why the simplex algorithm usually takes polynomial time

- Computer Science
- 2004

The smoothed analysis of algorithms is introduced, which continuously interpolates between the worst-case and average-case analyses of algorithms, and it is shown that the simplex algorithm has smoothed complexity polynomial in the input size and the standard deviation of Gaussian perturbations.

A fast hybrid k-means level set algorithm for segmentation

- Computer Science, Mathematics
- 2005

The proposed method retains spatial coherence on initial data characteristic of curve evolution techniques, as well as the balance between a pixel/voxel’s proximity to the curve and its intention to cross over the curve from the underlying energy.

Large-scale clustering of cDNA-fingerprinting data.

- Computer ScienceGenome research
- 1999

A pairwise similarity measure between two p-dimensional data points, x and y, is introduced that is superior to commonly used metric distances, for example, Euclidean distance and a modified version of mutual information is introduced as a novel method for validating clustering results when the true clustering is known.

Least squares quantization in PCM

- Computer ScienceIEEE Trans. Inf. Theory
- 1982

The corresponding result for any finite number of quanta is derived; that is, necessary conditions are found that the quanta and associated quantization intervals of an optimum finite quantization scheme must satisfy.