• Corpus ID: 221135780

Consistent k-Median: Simpler, Better and Robust

@article{Guo2021ConsistentKS,
  title={Consistent k-Median: Simpler, Better and Robust},
  author={Xiangyu Guo and Janardhan Kulkarni and Shi Li and Jiayi Xian},
  journal={ArXiv},
  year={2021},
  volume={abs/2008.06101}
}
In this paper we introduce and study the online consistent $k$-clustering with outliers problem, generalizing the non-outlier version of the problem studied in [Lattanzi-Vassilvitskii, ICML17]. We show that a simple local-search based online algorithm can give a bicriteria constant approximation for the problem with $O(k^2 \log^2 (nD))$ swaps of medians (recourse) in total, where $D$ is the diameter of the metric. When restricted to the problem without outliers, our algorithm is simpler… 

Figures from this paper

Online and Consistent Correlation Clustering

TLDR
This work studies the correlation clustering problem in the classic online setting with recourse in an online manner and develops an algorithm that achieves logarithmic recourse per vertex in the worst case and complement this result with a tight lower bound.

Optimal Fully Dynamic k-Centers Clustering

TLDR
It is proved that any algorithm for k-clustering tasks in arbitrary metric spaces, including k-means, k-medians, and k-centers, must make at least Ω(nk) distance queries to achieve any non-trivial approximation factor.

Consistent k-Clustering for General Metrics

TLDR
This work shows how to maintain a constant-factor approximation for the $k-median problem by performing an optimal (up to polylogarithimic factors) number $\widetilde{O}(k)$ of center swaps.

References

SHOWING 1-10 OF 31 REFERENCES

Constant approximation for k-median and k-means with outliers via iterative rounding

TLDR
A new iterative rounding framework for many clustering problems is presented, and an α1- and (α1 + є)-approximation algorithms for matroid and knapsack median problems respectively are given, improving upon the previous best approximations ratios.

Approximating k-median via pseudo-approximation

We present a novel approximation algorithm for k-median that achieves an approximation guarantee of 1+√3+ε, improving upon the decade-old ratio of 3+ε. Our approach is based on two components, each

Local Search Methods for k-Means with Outliers

TLDR
This work proposes a simple local search-based algorithm for k-means clustering with outliers and proves that this algorithm achieves constant-factor approximate solutions and can be combined with known sketching techniques to scale to large data sets.

Consistent k-Clustering

TLDR
A lower bound is proved, showing that Ω(k log n) changes are necessary in the worst case for a wide range of objective functions, and an algorithm is given that needs onlyO(log n)Changes to maintain a constant competitive solution.

Fully Dynamic Consistent Facility Location

TLDR
The cost of the solution maintained by the algorithm at any time is very close to the cost of a solution obtained by quickly recomputing a solution from scratch at time $t$ while having a much better running time.

A Constant-Factor Approximation Algorithm for the k-Median Problem

TLDR
This work presents the first constant-factor approximation algorithm for the metric k-median problem, and improves upon the best previously known result of O(log k log log log k), which was obtained by refining and derandomizing a randomized O( log n log log n)-approximation algorithm of Bartal.

Size Matters: Cardinality-Constrained Clustering and Outlier Detection via Conic Optimization

TLDR
A joint outlier detection and clustering problem is formulated, which assigns a prescribed number of datapoints to an auxiliary outlier cluster and performs cardinality-constrained K-means clustering on the residual dataset, treating the cluster cardinalities as a given input.

An Algorithm for Online K-Means Clustering

TLDR
It is shown that one can be competitive with the k-means objective while operating online and that, experimentally, it is not much worse than k-Means++ while operating in a strictly more constrained computational model.

k-means-: A Unified Approach to Clustering and Outlier Detection

TLDR
It is proved that the problem is NP-hard and then a practical polynomial time algorithm is presented, which is guaranteed to converge to a local optimum, and the approach is formalized as a generalization of the k-means problem.

Algorithms for facility location problems with outliers

TLDR
This paper explores a generalization of various facility location problems to the case when only a specified fraction of the customers are to be served, and provides generalizations of various approximation algorithms to deal with this added constraint.