# Approximate clustering via core-sets

@inproceedings{Badoiu2002ApproximateCV,
title={Approximate clustering via core-sets},
author={Mihai Badoiu and Sariel Har-Peled and Piotr Indyk},
booktitle={STOC '02},
year={2002}
}
• Published in STOC '02 19 May 2002
• Mathematics, Computer Science
In this paper, we show that for several clustering problems one can extract a small set of points, so that using those core-sets enable us to perform approximate clustering efficiently. The surprising property of those core-sets is that their size is independent of the dimension.Using those, we present a (1+ ε)-approximation algorithms for the k-center clustering and k-median clustering problems in Euclidean space. The running time of the new algorithms has linear or near linear dependency on… Expand
432 Citations

#### Figures and Topics from this paper

Clustering in High Dimensions
This thesis shows that, for several clustering problems, a small set of points can be extracted, so that, using these core-sets, the authors can approximate clustering efficiently and develops a (1 + e)-approximation algorithm for the 1-cylinder clustering problem. Expand
Efficient approximation algorithms for clustering point-sets
• Computer Science, Mathematics
• Comput. Geom.
• 2010
An O(m+nlogk)-time 3-approximation algorithm and an O(km)-time (1+3)-approximating algorithm, where m is the total number of input points and k is the number of clusters, for the k-center clustering problem on point-sets. Expand
On coresets for k-means and k-median clustering
• Mathematics, Computer Science
• STOC '04
• 2004
This paper shows the existence of small coresets for the problems of computing k-median/means clustering for points in low dimension, and improves the fastest known algorithms for (1+ε)-approximate k-means and k- median. Expand
Greedy Strategy Works for Clustering with Outliers and Coresets Construction
It is shown that this greedy strategy actually can handle k-center/median/means clustering with outliers efficiently, in terms of qualities and complexities, and can be applied to speedup the popular density-based clustering approach DBSCAN. Expand
Linear Time Algorithms for Clustering Problems in Any Dimensions
• Mathematics, Computer Science
• ICALP
• 2005
This work generalizes the k-means algorithm and shows that the resulting algorithm can solve a larger class of clustering problems that satisfy certain properties (existence of a random sampling procedure and tightness), resulting in O(2(k/e)O(1)dn) time (1+e)-approximation algorithms for these problems. Expand
Solving k-center Clustering (with Outliers) in MapReduce and Streaming, almost as Accurately as Sequentially
• Computer Science
• Proc. VLDB Endow.
• 2019
This paper presents coreset-based 2-round MapReduce algorithms for center-based clustering, and shows that the algorithms become very space-efficient for the important case of small (constant) D . Expand
Fast and exact out-of-core k-means clustering
• Computer Science
• Fourth IEEE International Conference on Data Mining (ICDM'04)
• 2004
This paper presents a new algorithm which typically requires only one or a small number of passes on the entire dataset, and provably produces the same cluster centers as reported by the original k-means algorithm. Expand
Fast approximate spectral clustering
• Mathematics, Computer Science
• KDD
• 2009
This work develops a general framework for fast approximate spectral clustering in which a distortion-minimizing local transformation is first applied to the data, and develops two concrete instances of this framework, one based on local k-means clustering (KASP) and onebased on random projection trees (RASP). Expand
Greedy Strategy Works for k-Center Clustering with Outliers and Coreset Construction
• Computer Science
• ESA
• 2019
It is shown that this greedy strategy actually can handle ordinary clustering with outliers efficiently, in terms of clustering quality and time complexity, and the greedy approach yields small coreset for the problem in doubling metrics, so as to reduce the time complexity significantly. Expand
A Structural Theorem for Center-Based Clustering in High-Dimensional Euclidean Space
We prove that, for any finite set X in Euclidean space of any dimension and for any fixed $$\varepsilon >0$$, there exists a polynomial-cardinality set of points in this space which can beExpand

#### References

SHOWING 1-10 OF 20 REFERENCES
Clustering Motion
A linear time algorithm is presented for computing a 2-approximation to the k-center clustering of a set of n points in ℝd, that slightly improves the algorithm of Feder and Greene, that runs in Θ(n log k) time (which is optimal in the algebraic decision tree model). Expand
Polynomial time approximation schemes for geometric k-clustering
• Computer Science, Mathematics
• Proceedings 41st Annual Symposium on Foundations of Computer Science
• 2000
This work deals with the problem of clustering data points, and gives polynomial time approximation schemes for this problem in several settings, including the binary cube (0, 1)/sup d/ with Hamming distance, and R/Sup d/ either with L/sup 1/ distance, or with L /sup 2/ distance. Expand
Approximation schemes for Euclidean k-medians and related problems
• Mathematics, Computer Science
• STOC '98
• 1998
An approximation scheme for the plane that for any c > 0 produces a solution of cost at most 1+ 1/c times the optimum and runs in time O(n) and generalizes to some problems related to k-median. Expand
Exact and Approximation Algorithms for Clustering
• Mathematics, Computer Science
• SODA '98
• 1998
An n^ O(k1-1/d) -time algorithm for solving the k -center problem in \realsd , under L∈fty - and L2 -metrics and extends to other metrics, and to the discrete k - center problem. Expand
Sublinear time approximate clustering
• Mathematics, Computer Science
• SODA '01
• 2001
This paper motivates and introduces a new model of clustering that is in the spirit of the “PAC (probably approximately correct)” learning model, and gives examples of efficient PAC-clustering algorithms. Expand
Testing of clustering
• Mathematics, Computer Science
• Proceedings 41st Annual Symposium on Foundations of Computer Science
• 2000
The benefit of the algorithms is that they construct an implicit representation of such clusterings in time independent of |X|, and the implicit representation can be used to answer queries concerning the cluster any given point belongs to. Expand
Clustering to Minimize the Maximum Intercluster Distance
• T. Gonzalez
• Computer Science, Mathematics
• Theor. Comput. Sci.
• 1985
An O(kn) approximation algorithm that guarantees solutions with an objective function value within two times the optimal solution value is presented and it is shown that this approximation algorithm succeeds as long as the set of points satisfies the triangular inequality. Expand
Derandomized dimensionality reduction with applications
• Mathematics, Computer Science
• SODA '02
• 2002
A procedure is provided that constructs such a mapping deterministically in time almost linear in the number of distances to preserve times the dimension of the original space and is used to obtain an efficient derandomization of several approximation algorithms based on semidefinite programming. Expand
Reductions among high dimensional proximity problems
• Mathematics, Computer Science
• SODA '01
• 2001
Improved running times for a wide range of approximate high dimensional proximity problems are presented by reduction to Nearest Neighbour queries by obtaining subquadratic running time for each. Expand
Approximate shape fitting via linearization
• Computer Science
• Proceedings 2001 IEEE International Conference on Cluster Computing
• 2001
The authors present a general and unified technique for solving a certain family of shape fitting problems without resorting to a careful and painful case by case analysis, as was previously done for those problems. Expand