• Corpus ID: 6236780

Greedy bi-criteria approximations for k-medians and k-means

  title={Greedy bi-criteria approximations for k-medians and k-means},
  author={Daniel J. Hsu and Matus Telgarsky},
This paper investigates the following natural greedy procedure for clustering in the bi-criterion setting: iteratively grow a set of centers, in each round adding the center from a candidate set that maximally decreases clustering cost. In the case of $k$-medians and $k$-means, the key results are as follows. $\bullet$ When the method considers all data points as candidate centers, then selecting $\mathcal{O}(k\log(1/\varepsilon))$ centers achieves cost at most $2+\varepsilon$ times the… 

Figures from this paper

Noisy, Greedy and Not So Greedy k-means++
It is proved that noisy k-means++ compute an $O(\log^2 k)$-approximation in expectation by presenting a family of instances on which greedy k- means++ yields only an $\Omega(\ell\cdot \log k)$.
Coresets for (k, l)-Clustering under the Fréchet Distance
This thesis considers clustering polygonal curves, i.e., curves composed of line segments, under the Fr\'echet distance, and develops a construction method for a notably smaller set of curves that has a very similar clustering-behavior.
A constant FPT approximation algorithm for hard-capacitated k-means
This work proposes an FPT ($k$) algorithm with performance guarantee of $69+\epsilon$ for any HCKM instances in this paper, which is known to be at least APX-hard.
Low-Rank Approximation from Communication Complexity
It is shown that different models of communication yield algorithms for natural variants of masked low-rank approximation, and that multi-player number-in-hand communication complexity connects to masked tensor decomposition and non-deterministic communication complexity to masked Boolean low- rank factorization.
A survey on theory and algorithms for bm$k$-means problems
This paper introduces effective algorithms based on local search, linear programming rounding, primal-dual, dual-fitting, Lagrange relaxation and other techniques for the classical k-means problem and its variants, and surveys several impor-tant variants of k -means problems.
A constant parameterized approximation for hard-capacitated k-means
This paper proposes an FPT approximation for HCKM without violating any hard constraints whose running time is 2 O(k\log k)n^{O(1)$ and performance guarantee is 69+\epsilon.
Relative Error Tensor Low Rank Approximation
The first relative error low rank approximations for tensors for a large number of robust error measures for which nothing was known are given, as well as column row and tube subset selection.


A Bi-Criteria Approximation Algorithm for k-Means
New bi-criteria approximation algorithms, based on linear programming and local search, respectively, are given, which attain a guarantee of $\alpha(\beta)$ depending on the number of clusters that may be opened, and are applicable in high-dimensional settings.
Stability Yields a PTAS for k-Median and k-Means Clustering
Improvements are made to the distance of the clustering found to the target from $O(\delta)$ to $\delta$ when all target clusters are large, and for $k-median the authors improve the ``largeness'' condition needed in the work of Balcan et al. to get exactly $delta-close from O(delta n) to $\Delta n$.
A Constant-Factor Bi-Criteria Approximation Guarantee for k-means++
It is shown that for any constant factor $\beta > 1$, selecting $\beta k$ cluster centers by D^\ell sampling yields a constant-factor approximation to the optimal clustering with $k$ centers, in expectation and without conditions on the dataset.
A Nearly Linear-Time Approximation Scheme for the Euclidean k-Median Problem
This paper provides a randomized approximation scheme for the k-median problem when the input points lie in the d-dimensional Euclidean space and develops a structure theorem to describe hierarchical decomposition of solutions.
On Variants of k-means Clustering
A "bi-criterion" local search algorithm for $k-means which uses $(1+\eps)k$ centers and yields a solution whose cost is at most $(1-\ep)$ times the cost of an optimal $k$-mean solution, which runs in polynomial time for any fixed dimension.
The Hardness of Approximation of Euclidean k-Means
The first hardness of approximation for the Euclidean $k-means problem is provided via an efficient reduction from the vertex cover problem on triangle-free graphs: given a triangle- free graph, the goal is to choose the fewest number of vertices which are incident on all the edges.
Local Search Yields a PTAS for k-Means in Doubling Metrics
The problem is settled by showing that a simple local search algorithm provides a PTAS for k-MEANS for Rd for any fixed d, and this analysis extends very easily to the more general settings where the metric may not be Euclidean but still has fixed doubling dimension.
A Nearly Linear-Time Approximation Scheme for the Euclidean kappa-median Problem
A randomized approximation scheme for points in d- dimensional Euclidean space, with running time O(21/?d n log n log k); which is nearly linear for any fixed ? and d and develops a structure theorem to describe hierarchical decomposition of solutions.
Linear Time Algorithms for Clustering Problems in Any Dimensions
This work generalizes the k-means algorithm and shows that the resulting algorithm can solve a larger class of clustering problems that satisfy certain properties (existence of a random sampling procedure and tightness), resulting in O(2(k/e)O(1)dn) time (1+e)-approximation algorithms for these problems.
A PTAS for k-means clustering based on weak coresets
Every unweighted point set P has a weak coreset of size Poly(k,1/ε) for the k-means clustering problem, i.e. its size is independent of the cardinality |P| of the point set and the dimension d of the Euclidean space R<sup>d</sup>.