# Greedy bi-criteria approximations for k-medians and k-means

@article{Hsu2016GreedyBA, title={Greedy bi-criteria approximations for k-medians and k-means}, author={Daniel J. Hsu and Matus Telgarsky}, journal={ArXiv}, year={2016}, volume={abs/1607.06203} }

This paper investigates the following natural greedy procedure for clustering in the bi-criterion setting: iteratively grow a set of centers, in each round adding the center from a candidate set that maximally decreases clustering cost. In the case of $k$-medians and $k$-means, the key results are as follows.
$\bullet$ When the method considers all data points as candidate centers, then selecting $\mathcal{O}(k\log(1/\varepsilon))$ centers achieves cost at most $2+\varepsilon$ times the…

## 7 Citations

Noisy, Greedy and Not So Greedy k-means++

- Computer ScienceESA
- 2020

It is proved that noisy k-means++ compute an $O(\log^2 k)$-approximation in expectation by presenting a family of instances on which greedy k- means++ yields only an $\Omega(\ell\cdot \log k)$.

A constant FPT approximation algorithm for hard-capacitated k-means

- Computer Science, Mathematics
- 2019

This work proposes an FPT ($k$) algorithm with performance guarantee of $69+\epsilon$ for any HCKM instances in this paper, which is known to be at least APX-hard.

Low-Rank Approximation from Communication Complexity

- Computer ScienceArXiv
- 2019

It is shown that different models of communication yield algorithms for natural variants of masked low-rank approximation, and that multi-player number-in-hand communication complexity connects to masked tensor decomposition and non-deterministic communication complexity to masked Boolean low- rank factorization.

A survey on theory and algorithms for bm$k$-means problems

- Computer ScienceSCIENTIA SINICA Mathematica
- 2020

This paper introduces eﬀective algorithms based on local search, linear programming rounding, primal-dual, dual-ﬁtting, Lagrange relaxation and other techniques for the classical k-means problem and its variants, and surveys several impor-tant variants of k -means problems.

A constant parameterized approximation for hard-capacitated k-means

- Computer ScienceArXiv
- 2019

This paper proposes an FPT approximation for HCKM without violating any hard constraints whose running time is 2 O(k\log k)n^{O(1)$ and performance guarantee is 69+\epsilon.

Relative Error Tensor Low Rank Approximation

- Computer ScienceElectron. Colloquium Comput. Complex.
- 2018

The first relative error low rank approximations for tensors for a large number of robust error measures for which nothing was known are given, as well as column row and tube subset selection.

Coresets for (k, l)-Clustering under the Fréchet Distance

- Computer Science, MathematicsArXiv
- 2019

This thesis considers clustering polygonal curves, i.e., curves composed of line segments, under the Fr\'echet distance, and develops a construction method for a notably smaller set of curves that has a very similar clustering-behavior.

## References

SHOWING 1-10 OF 31 REFERENCES

A Bi-Criteria Approximation Algorithm for k-Means

- Computer Science, MathematicsAPPROX-RANDOM
- 2016

New bi-criteria approximation algorithms, based on linear programming and local search, respectively, are given, which attain a guarantee of $\alpha(\beta)$ depending on the number of clusters that may be opened, and are applicable in high-dimensional settings.

Stability Yields a PTAS for k-Median and k-Means Clustering

- Computer Science, Mathematics2010 IEEE 51st Annual Symposium on Foundations of Computer Science
- 2010

Improvements are made to the distance of the clustering found to the target from $O(\delta)$ to $\delta$ when all target clusters are large, and for $k-median the authors improve the ``largeness'' condition needed in the work of Balcan et al. to get exactly $delta-close from O(delta n) to $\Delta n$.

A Nearly Linear-Time Approximation Scheme for the Euclidean k-Median Problem

- Mathematics, Computer ScienceSIAM J. Comput.
- 2007

This paper provides a randomized approximation scheme for the k-median problem when the input points lie in the d-dimensional Euclidean space and develops a structure theorem to describe hierarchical decomposition of solutions.

On Variants of k-means Clustering

- Computer Science, MathematicsSoCG
- 2016

A "bi-criterion" local search algorithm for $k-means which uses $(1+\eps)k$ centers and yields a solution whose cost is at most $(1-\ep)$ times the cost of an optimal $k$-mean solution, which runs in polynomial time for any fixed dimension.

The Hardness of Approximation of Euclidean k-Means

- Mathematics, Computer ScienceSoCG
- 2015

The first hardness of approximation for the Euclidean $k-means problem is provided via an efficient reduction from the vertex cover problem on triangle-free graphs: given a triangle- free graph, the goal is to choose the fewest number of vertices which are incident on all the edges.

Local Search Yields a PTAS for k-Means in Doubling Metrics

- Mathematics, Computer Science2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS)
- 2016

The problem is settled by showing that a simple local search algorithm provides a PTAS for k-MEANS for Rd for any fixed d, and this analysis extends very easily to the more general settings where the metric may not be Euclidean but still has fixed doubling dimension.

A Nearly Linear-Time Approximation Scheme for the Euclidean kappa-median Problem

- Mathematics, Computer ScienceESA
- 1999

A randomized approximation scheme for points in d- dimensional Euclidean space, with running time O(21/?d n log n log k); which is nearly linear for any fixed ? and d and develops a structure theorem to describe hierarchical decomposition of solutions.

Linear Time Algorithms for Clustering Problems in Any Dimensions

- Computer Science, MathematicsICALP
- 2005

This work generalizes the k-means algorithm and shows that the resulting algorithm can solve a larger class of clustering problems that satisfy certain properties (existence of a random sampling procedure and tightness), resulting in O(2(k/e)O(1)dn) time (1+e)-approximation algorithms for these problems.

A PTAS for k-means clustering based on weak coresets

- Computer ScienceSCG '07
- 2007

Every unweighted point set P has a weak coreset of size Poly(k,1/ε) for the k-means clustering problem, i.e. its size is independent of the cardinality |P| of the point set and the dimension d of the Euclidean space R<sup>d</sup>.

On Approximate Geometric K-clustering

- Computer Science, Mathematics
- 1999

A deterministic algorithm is presented that nds a 2-clustering with cost no worse than (1 + ")-times the minimum cost in time O(n log n); the constant of proportionality depends polynomially on "".