# Constant approximation for k-median and k-means with outliers via iterative rounding

@article{Krishnaswamy2018ConstantAF, title={Constant approximation for k-median and k-means with outliers via iterative rounding}, author={Ravishankar Krishnaswamy and Shi Li and Sai Sandeep}, journal={Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing}, year={2018} }

In this paper, we present a new iterative rounding framework for many clustering problems. Using this, we obtain an (α1 + є ≤ 7.081 + є)-approximation algorithm for k-median with outliers, greatly improving upon the large implicit constant approximation ratio of Chen. For k-means with outliers, we give an (α2+є ≤ 53.002 + є)-approximation, which is the first O(1)-approximation for this problem. The iterative algorithm framework is very versatile; we show how it can be used to give α1- and (α1… Expand

#### Figures and Topics from this paper

#### 59 Citations

Structural Iterative Rounding for Generalized k-Median Problems

- Computer Science, Mathematics
- ICALP
- 2021

Improved approximation algorithms for generalized $k-median with outliers and knapsack median are given, allowing richer constraint sets in the iterative rounding and taking advantage of the structure of the resulting extreme points. Expand

Greedy Sampling for Approximate Clustering in the Presence of Outliers

- Computer Science
- NeurIPS
- 2019

This work shows that for k-means and k-center clustering, simple modifications to the well-studied greedy algorithms result in nearly identical guarantees, while additionally being robust to outliers. Expand

Improved Algorithms for Clustering with Outliers

- Computer Science, Mathematics
- ISAAC
- 2019

This paper gave the first PTAS for the k-median problem with outliers in Euclidean space R^d for possibly high m and d, and introduced a (6+epsilon)-approximation algorithm for general metric space with running time O(n(beta (1/ep silon)(k+m))^k) for some constant beta>1. Expand

Outliers Detection Is Not So Hard: Approximation Algorithms for Robust Clustering Problems Using Local Search Techniques

- Computer Science, Mathematics
- ArXiv
- 2020

A new technique to analyze the approximation ratio of local search algorithms for k-median/k-means problems by introducing an adapted cluster that can capture useful information about outliers in the local and the global optimal solution. Expand

Robust k-means++

- Computer Science, Physics
- UAI
- 2020

This work shows that using a mixture of D and uniform sampling, one can pick O(k/δ) candidate centers with the following guarantee: they contain some k centers that give O(1)-approximation to the optimal robust k-means solution while discarding at most δn more points than the outliers discarded by the optimal solution. Expand

An Improved Approximation Algorithm for the k-Means Problem with Penalties

- Computer Science
- FAW
- 2019

The clustering problem has been paid lots of attention in various fields of compute science. However, in many applications, the existence of noisy data poses a big challenge for the clustering… Expand

Improved Approximation Algorithms for Individually Fair Clustering

- Computer Science
- ArXiv
- 2021

This work extends the framework of [Charikar et al., 2002, Swamy, 2016] and devise a 16-approximation algorithm for the facility location with lp-norm cost under matroid constraint which might be of an independent interest and suggests a reduction from an individually fair clustering to a clustering with a group fairness requirement proposed by Kleindessner et al. Expand

On Sampling Based Algorithms for k-Means

- Computer Science
- FSTTCS
- 2020

Making use of a constant factor solution for the (classical or unconstrained) k-means problem, the results of Bhattacharya et al. are generalised and a constant pass, polylog-space streaming PTAS for either of the two problems is designed. Expand

Fault Tolerant Clustering with Outliers

- Mathematics, Computer Science
- WAOA
- 2019

This work essentially reduces the Fault Tolerant Clustering with Outliers problem, to the corresponding (non Fault Tolerance) Clustered with outlier problem, for which constant approximations are known, and shows that it is bounded by O(1) for the k-center objective, whereas it is O(f) for k-median and sum of radii objectives. Expand

Is Simple Uniform Sampling Efficient for Center-Based Clustering With Outliers: When and Why?

- Computer Science
- ArXiv
- 2021

This is the first work that systematically studies the effectiveness of uniform sampling from both theoretical and experimental aspects, and introduces a “significance” criterion and proves that the performance of the framework depends on the significance degree of the given instance. Expand

#### References

SHOWING 1-10 OF 49 REFERENCES

Approximating k-median via pseudo-approximation

- Computer Science, Mathematics
- STOC '13
- 2013

We present a novel approximation algorithm for k-median that achieves an approximation guarantee of 1+√3+ε, improving upon the decade-old ratio of 3+ε. Our approach is based on two components, each… Expand

Better Guarantees for k-Means and Euclidean k-Median by Primal-Dual Algorithms

- Mathematics, Computer Science
- 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS)
- 2017

A new primal-dual approach is presented that allows to exploit the geometric structure of k-means and to satisfy the hard constraint that at most k clusters are selected without deteriorating the approximation guarantee. Expand

A Dependent LP-Rounding Approach for the k-Median Problem

- Computer Science, Mathematics
- ICALP
- 2012

This paper revisits the classical k-median problem and gives an efficient algorithm to construct a probability distribution on sets of k centers that matches the marginals specified by the optimal LP solution. Expand

A Constant-Factor Approximation Algorithm for the k-Median Problem

- Computer Science, Mathematics
- J. Comput. Syst. Sci.
- 2002

This work presents the first constant-factor approximation algorithm for the metric k-median problem, and improves upon the best previously known result of O(log k log log log k), which was obtained by refining and derandomizing a randomized O( log n log log n)-approximation algorithm of Bartal. Expand

A constant-factor approximation algorithm for the k-median problem (extended abstract)

- Computer Science
- STOC '99
- 1999

This work presents the first constant-factor approximation algorithm for the metric k-median problem, a polynomial-time algorithm that finds a feasible solution of objective function value within a factor of 6 of the optimum, and gives constant factor approximation algorithms for several natural extensions of the problem. Expand

An Improved Approximation for k-Median and Positive Correlation in Budgeted Optimization

- Mathematics, Computer Science
- ACM Trans. Algorithms
- 2017

This work improves upon Li-Svensson’s approximation ratio for k-median by developing an algorithm that improves upon various aspects of their work and develops algorithms that guarantee the known properties of dependent rounding but also have nearly bestpossible behavior—near-independence, which generalizes positive correlation—on “small” subsets of the variables. Expand

Local Search Methods for k-Means with Outliers

- Computer Science
- Proc. VLDB Endow.
- 2017

This work proposes a simple local search-based algorithm for k-means clustering with outliers and proves that this algorithm achieves constant-factor approximate solutions and can be combined with known sketching techniques to scale to large data sets. Expand

Approximation schemes for Euclidean k-medians and related problems

- Mathematics, Computer Science
- STOC '98
- 1998

An approximation scheme for the plane that for any c > 0 produces a solution of cost at most 1+ 1/c times the optimum and runs in time O(n) and generalizes to some problems related to k-median. Expand

Improved Approximation Algorithms for Matroid and Knapsack Median Problems and Applications

- Mathematics, Computer Science
- ACM Trans. Algorithms
- 2016

A variety of seemingly disparate facility-location problems considered in the literature—data placement problem, mobile facility location, k-median forest, metric uniform minimum-latency Uncapacitated Facility Location (UFL)—in fact reduce to the matroid median or two-matroid median problems, and thus obtain improved approximation guarantees for all these problems. Expand

Clustering under approximation stability

- Computer Science, Mathematics
- JACM
- 2013

It is shown that for any constant c > 1, (c,ε)-approximation-stability of k-median or k-means objectives can be used to efficiently produce a clustering of error O(ε) with respect to the target clustering, as can stability of the min-sum objective if the target clusters are sufficiently large. Expand