Corpus ID: 195874370

# Coresets for Clustering in Graphs of Bounded Treewidth

@inproceedings{Braverman2020CoresetsFC,
title={Coresets for Clustering in Graphs of Bounded Treewidth},
author={Vladimir Braverman and Lingxiao Huang and Shaofeng H.-C. Jiang and Robert Krauthgamer and Xuan-Wei Wu},
booktitle={ICML},
year={2020}
}
We initiate the study of coresets for clustering in graph metrics, i.e., the shortest-path metric of edge-weighted graphs. Such clustering problems (on graph metrics) are essential to data analysis and used for example in road networks and data visualization. Specifically, we consider $(k, z)$-Clustering, where given a metric space $(V, d)$, the goal is to minimize, over all $k$-point center sets $C$, the objective $\sum_{x \in V}{d^z(x, C)}$. This problem is a well-known generalization of both… Expand
A new coreset framework for clustering
• Computer Science
• STOC
• 2021
A new, simple coreset framework is presented that simultaneously improves upon the best known bounds for a large variety of settings, ranging from Euclidean space, doubling metric, minor-free metric, and the general metric cases. Expand
On Coresets for Fair Clustering in Metric and Euclidean Spaces and Their Applications
• Computer Science
• ICALP
• 2021
The new coreset construction scheme is fairly general and gives rise to coresets for a wide range of constrained clustering problems, which leads to improved constant-approximations for these problems in general metrics and near-linear time $(1+\epsilon)$-app approximations in the Euclidean metric. Expand
Coresets for clustering in Euclidean spaces: importance sampling is nearly optimal
• Mathematics, Computer Science
• STOC
• 2020
A unified two-stage importance sampling framework that constructs an ε-coreset for the (k,z)-clustering problem and relies on a new dimensionality reduction technique that connects two well-known shape fitting problems: subspace approximation and clustering, and may be of independent interest. Expand
Coresets for k-median clustering under Fréchet and Hausdorff distances
The authors' is the first such result, where the size of the coreset is independent of the number of input curves/point sets to be clustered (although it still depends on the maximum complexity of each input object). Expand
One Backward from Ten Forward, Subsampling for Large-Scale Deep Learning
A novel optimization framework is proposed to analyze this problem and an efficient approximation algorithm under the framework of Mini-batch gradient descent as a practical solution is provided. Expand
Coresets for Clustering in Excluded-minor Graphs and Beyond
• Computer Science, Mathematics
• SODA
• 2021
This work obtains an efficient coreset construction in high-dimensional Euclidean spaces, thereby matching and simplifying state-of-the-art results (Sohler and Woodruff, FOCS 2018; Huang and Vishnoi, STOC 2020). Expand
Coresets for Clustering with Missing Values
• Computer Science, Mathematics
• ArXiv
• 2021
The first coreset for clustering points in R that have multiple missing values (coordinates) is provided, which exhibits a flexible tradeoff between coreset size and accuracy, and generally outperforms the uniformsampling baseline. Expand

#### References

SHOWING 1-10 OF 69 REFERENCES
Coresets for Ordered Weighted Clustering
• Mathematics, Computer Science
• ICML
• 2019
The main result is a construction of a simultaneous coreset of size $O_{\epsilon, d}(k^2 \log^2 |X|)$ for Ordered k-Median, which translates to a massive speedup of clustering computations, while maintaining high accuracy for a range of weights. Expand
On Coresets for k-Median and k-Means Clustering in Metric and Euclidean Spaces and Their Applications
• K. Chen
• Mathematics, Computer Science
• SIAM J. Comput.
• 2009
These are the first streaming algorithms, for those problems, that have space complexity with polynomial dependency on the dimension, using $O(d^2k^2\varepsilon^{-2}\log^8n)$ space. Expand
Efficient approximation schemes for uniform-cost clustering problems in planar graphs
• Mathematics, Computer Science
• ESA
• 2019
A new construction of a "coreset for facilities" for planar graphs that in polynomial time one can compute a subset of facilities of size F_0 with a guarantee that there is a $(1+\epsilon)$-approximate solution contained in $F_0$. Expand
Epsilon-Coresets for Clustering (with Outliers) in Doubling Metrics
• Computer Science, Mathematics
• 2018 IEEE 59th Annual Symposium on Foundations of Computer Science (FOCS)
• 2018
The first relation between the doubling dimension of M(X, d) and the shattering dimension of the range space induced by the distance d is established and an upper bound of O(ddim(M)⋅ log(1/ε) +log log 1/τ) is proved for the probabilistic shattering dimension for even weighted doubling metrics. Expand
Quick k-Median, k-Center, and Facility Location for Sparse Graphs
• M. Thorup
• Computer Science, Mathematics
• SIAM J. Comput.
• 2004
We present $\tilde{O}(m)$ time and space constant factor approximation algorithms for the k-median, k-center, and facility location problems with assignment costs being shortest path distances in aExpand
VC-dimension and Erdős-Pósa property
• Mathematics, Computer Science
• Discret. Math.
• 2015
The goal is to generalize the proof of Chepoi et?al. (2007) with the unique assumption of bounded distance VC-dimension of neighborhoods, in other words, the set of balls of fixed radius in a graph with bounded distanceVC-dimension has the Erd?s-Posa property. Expand
On coresets for k-means and k-median clustering
• Mathematics, Computer Science
• STOC '04
• 2004
This paper shows the existence of small coresets for the problems of computing k-median/means clustering for points in low dimension, and improves the fastest known algorithms for (1+ε)-approximate k-means and k- median. Expand
k-Means Clustering of Lines for Big Data
• Computer Science, Mathematics
• NeurIPS
• 2019
It is proved that there is always a weighted subset (called coreset) of $dk^{O(k)}\log (n)/\epsilon^2$ lines in $L$ that approximates the sum of squared distances from £L to any given set of $k$ points, which implies results for a streaming set of lines to machines in one pass. Expand
Coresets for Clustering with Fairness Constraints
• Computer Science, Mathematics
• NeurIPS
• 2019
An approach to clustering with fairness constraints that involve multiple, non-disjoint types, that is also scalable and achieves a speed-up to recent fair clustering algorithms by incorporating the first known coreset construction for theFair clustering problem with thek-median objective. Expand
Shortest Paths in Digraphs of Small Treewidth. Part I: Sequential Algorithms
• Computer Science, Mathematics
• Algorithmica
• 2000
Abstract. We consider the problem of preprocessing an n -vertex digraph with real edge weights so that subsequent queries for the shortest path or distance between any two vertices can be efficientlyExpand