k-means Requires Exponentially Many Iterations Even in the Plane

  title={k-means Requires Exponentially Many Iterations Even in the Plane},
  author={Andrea Vattani},
  journal={Discrete \& Computational Geometry},
  • Andrea Vattani
  • Published 2011
  • Computer Science, Mathematics
  • Discrete & Computational Geometry
The k-means algorithm is a well-known method for partitioning n points that lie in the d-dimensional space into k clusters. Its main features are simplicity and speed in practice. Theoretically, however, the best known upper bound on its running time (i.e., nO(kd)) is, in general, exponential in the number of points (when kd=Ω(n/log n)). Recently Arthur and Vassilvitskii (Proceedings of the 22nd Annual Symposium on Computational Geometry, pp. 144–153, 2006) showed a super-polynomial worst-case… 
k-means requires exponentially many iterations even in the plane
This work proves the existence of super-polynomial lower bounds for any d≥ 2 and improves the lower bound, by presenting a simple construction in the plane that leads to the exponential lower bound 2Ω(n).
Exact algorithms for size constrained 2-clustering in the plane
An approximation algorithm for the uniform capacitated k-means problem
Based on the technique of local search, a bi-criteria approximation algorithm is presented, which has a constant approximation guarantee and violates the cardinality constraint within a constant factor, for the UC-k-means.
Clustering Perturbation Resilient Instances
This work considers stable instances of Euclidean $k-means that have provable polynomial time algorithms for recovering optimal cluster and proposes simple algorithms with running time linear in the number of points and the dimension that provably recover the optimal clustering.
The seeding algorithm for spherical k-means clustering with penalties
It is proved that when against spherical k-means clustering with penalties but on separable instances, the algorithm is with an approximation ratio $$2\max \{3,M+1\}$$ with high probability, where M is the ratio of the maximal and the minimal penalty cost of the given data set.
A distance saving approach to the K-means problem for massive data
Experimental results indicate that the proposed approximation to the solution of the K-means problem outperforms well-known approaches in terms of the relation between number of computations and the quality of the approximation.
On the minimum of the mean-squared error in 2-means clustering
We study the minimum mean-squared error for 2-means clustering when the outcomes of the vector-valued random variable to be clustered are on two touching spheres of unit radius in $n$-dimensional
Sketching and Clustering Metric Measure Spaces
A duality between general classes of clustering and sketching problems is demonstrated, and it is proved that whereas the gap between these can be arbitrarily large, in the case of doubling metric spaces the resulting sketching objectives are polynomially related.
Analysis of Ward's Method
It is shown that Ward's method computes a $2-approximation with respect to the $k$-means objective function if the optimal $k-clustering is well separated, and that Ward produces an $\mathcal{O}(1)$ -approximative clustering for one-dimensional data sets.
Scalable K-Means++
It is proved that the proposed initialization algorithm k-means|| obtains a nearly optimal solution after a logarithmic number of passes, and Experimental evaluation on real-world large-scale data demonstrates that k-Means|| outperforms k- means++ in both sequential and parallel settings.