• Corpus ID: 6236780

# Greedy bi-criteria approximations for k-medians and k-means

@article{Hsu2016GreedyBA,
title={Greedy bi-criteria approximations for k-medians and k-means},
author={Daniel J. Hsu and Matus Telgarsky},
journal={ArXiv},
year={2016},
volume={abs/1607.06203}
}
• Published 21 July 2016
• Computer Science
• ArXiv
This paper investigates the following natural greedy procedure for clustering in the bi-criterion setting: iteratively grow a set of centers, in each round adding the center from a candidate set that maximally decreases clustering cost. In the case of $k$-medians and $k$-means, the key results are as follows. $\bullet$ When the method considers all data points as candidate centers, then selecting $\mathcal{O}(k\log(1/\varepsilon))$ centers achieves cost at most $2+\varepsilon$ times the…
7 Citations

## Figures from this paper

Noisy, Greedy and Not So Greedy k-means++
• Computer Science
ESA
• 2020
It is proved that noisy k-means++ compute an $O(\log^2 k)$-approximation in expectation by presenting a family of instances on which greedy k- means++ yields only an $\Omega(\ell\cdot \log k)$.
Coresets for (k, l)-Clustering under the Fréchet Distance
This thesis considers clustering polygonal curves, i.e., curves composed of line segments, under the Fr\'echet distance, and develops a construction method for a notably smaller set of curves that has a very similar clustering-behavior.
A constant FPT approximation algorithm for hard-capacitated k-means
• Computer Science, Mathematics
• 2019
This work proposes an FPT ($k$) algorithm with performance guarantee of $69+\epsilon$ for any HCKM instances in this paper, which is known to be at least APX-hard.
Low-Rank Approximation from Communication Complexity
• Computer Science
ArXiv
• 2019
It is shown that different models of communication yield algorithms for natural variants of masked low-rank approximation, and that multi-player number-in-hand communication complexity connects to masked tensor decomposition and non-deterministic communication complexity to masked Boolean low- rank factorization.
A survey on theory and algorithms for bm$k$-means problems
• Computer Science
SCIENTIA SINICA Mathematica
• 2020
This paper introduces eﬀective algorithms based on local search, linear programming rounding, primal-dual, dual-ﬁtting, Lagrange relaxation and other techniques for the classical k-means problem and its variants, and surveys several impor-tant variants of k -means problems.
A constant parameterized approximation for hard-capacitated k-means
• Computer Science
ArXiv
• 2019
This paper proposes an FPT approximation for HCKM without violating any hard constraints whose running time is 2 O(k\log k)n^{O(1)$and performance guarantee is 69+\epsilon. Relative Error Tensor Low Rank Approximation • Computer Science Electron. Colloquium Comput. Complex. • 2018 The first relative error low rank approximations for tensors for a large number of robust error measures for which nothing was known are given, as well as column row and tube subset selection. ## References SHOWING 1-10 OF 32 REFERENCES A Bi-Criteria Approximation Algorithm for k-Means • Computer Science, Mathematics APPROX-RANDOM • 2016 New bi-criteria approximation algorithms, based on linear programming and local search, respectively, are given, which attain a guarantee of$\alpha(\beta)$depending on the number of clusters that may be opened, and are applicable in high-dimensional settings. Stability Yields a PTAS for k-Median and k-Means Clustering • Computer Science, Mathematics 2010 IEEE 51st Annual Symposium on Foundations of Computer Science • 2010 Improvements are made to the distance of the clustering found to the target from$O(\delta)$to$\delta$when all target clusters are large, and for$k-median the authors improve the largeness'' condition needed in the work of Balcan et al. to get exactly $delta-close from O(delta n) to$\Delta n$. A Constant-Factor Bi-Criteria Approximation Guarantee for k-means++ It is shown that for any constant factor$\beta > 1$, selecting$\beta k$cluster centers by D^\ell sampling yields a constant-factor approximation to the optimal clustering with$k$centers, in expectation and without conditions on the dataset. A Nearly Linear-Time Approximation Scheme for the Euclidean k-Median Problem • Mathematics, Computer Science SIAM J. Comput. • 2007 This paper provides a randomized approximation scheme for the k-median problem when the input points lie in the d-dimensional Euclidean space and develops a structure theorem to describe hierarchical decomposition of solutions. On Variants of k-means Clustering • Computer Science, Mathematics SoCG • 2016 A "bi-criterion" local search algorithm for$k-means which uses $(1+\eps)k$ centers and yields a solution whose cost is at most $(1-\ep)$ times the cost of an optimal $k$-mean solution, which runs in polynomial time for any fixed dimension.
The Hardness of Approximation of Euclidean k-Means
• Mathematics, Computer Science
SoCG
• 2015
The first hardness of approximation for the Euclidean \$k-means problem is provided via an efficient reduction from the vertex cover problem on triangle-free graphs: given a triangle- free graph, the goal is to choose the fewest number of vertices which are incident on all the edges.
Local Search Yields a PTAS for k-Means in Doubling Metrics
• Mathematics, Computer Science
2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS)
• 2016
The problem is settled by showing that a simple local search algorithm provides a PTAS for k-MEANS for Rd for any fixed d, and this analysis extends very easily to the more general settings where the metric may not be Euclidean but still has fixed doubling dimension.
A Nearly Linear-Time Approximation Scheme for the Euclidean kappa-median Problem
• Mathematics, Computer Science
ESA
• 1999
A randomized approximation scheme for points in d- dimensional Euclidean space, with running time O(21/?d n log n log k); which is nearly linear for any fixed ? and d and develops a structure theorem to describe hierarchical decomposition of solutions.
Linear Time Algorithms for Clustering Problems in Any Dimensions
• Computer Science, Mathematics
ICALP
• 2005
This work generalizes the k-means algorithm and shows that the resulting algorithm can solve a larger class of clustering problems that satisfy certain properties (existence of a random sampling procedure and tightness), resulting in O(2(k/e)O(1)dn) time (1+e)-approximation algorithms for these problems.
A PTAS for k-means clustering based on weak coresets
• Computer Science
SCG '07
• 2007
Every unweighted point set P has a weak coreset of size Poly(k,1/ε) for the k-means clustering problem, i.e. its size is independent of the cardinality |P| of the point set and the dimension d of the Euclidean space R<sup>d</sup>.