Corpus ID: 195874370

Coresets for Clustering in Graphs of Bounded Treewidth

@inproceedings{Braverman2020CoresetsFC,
  title={Coresets for Clustering in Graphs of Bounded Treewidth},
  author={Vladimir Braverman and Lingxiao Huang and Shaofeng H.-C. Jiang and Robert Krauthgamer and Xuan-Wei Wu},
  booktitle={ICML},
  year={2020}
}
We initiate the study of coresets for clustering in graph metrics, i.e., the shortest-path metric of edge-weighted graphs. Such clustering problems (on graph metrics) are essential to data analysis and used for example in road networks and data visualization. Specifically, we consider $(k, z)$-Clustering, where given a metric space $(V, d)$, the goal is to minimize, over all $k$-point center sets $C$, the objective $\sum_{x \in V}{d^z(x, C)}$. This problem is a well-known generalization of both… Expand
A new coreset framework for clustering
TLDR
A new, simple coreset framework is presented that simultaneously improves upon the best known bounds for a large variety of settings, ranging from Euclidean space, doubling metric, minor-free metric, and the general metric cases. Expand
On Coresets for Fair Clustering in Metric and Euclidean Spaces and Their Applications
TLDR
The new coreset construction scheme is fairly general and gives rise to coresets for a wide range of constrained clustering problems, which leads to improved constant-approximations for these problems in general metrics and near-linear time $(1+\epsilon)$-app approximations in the Euclidean metric. Expand
Coresets for clustering in Euclidean spaces: importance sampling is nearly optimal
TLDR
A unified two-stage importance sampling framework that constructs an ε-coreset for the (k,z)-clustering problem and relies on a new dimensionality reduction technique that connects two well-known shape fitting problems: subspace approximation and clustering, and may be of independent interest. Expand
Coresets for k-median clustering under Fréchet and Hausdorff distances
TLDR
The authors' is the first such result, where the size of the coreset is independent of the number of input curves/point sets to be clustered (although it still depends on the maximum complexity of each input object). Expand
One Backward from Ten Forward, Subsampling for Large-Scale Deep Learning
TLDR
A novel optimization framework is proposed to analyze this problem and an efficient approximation algorithm under the framework of Mini-batch gradient descent as a practical solution is provided. Expand
Coresets for Clustering in Excluded-minor Graphs and Beyond
TLDR
This work obtains an efficient coreset construction in high-dimensional Euclidean spaces, thereby matching and simplifying state-of-the-art results (Sohler and Woodruff, FOCS 2018; Huang and Vishnoi, STOC 2020). Expand
Coresets for Clustering with Missing Values
TLDR
The first coreset for clustering points in R that have multiple missing values (coordinates) is provided, which exhibits a flexible tradeoff between coreset size and accuracy, and generally outperforms the uniformsampling baseline. Expand

References

SHOWING 1-10 OF 69 REFERENCES
Coresets for Ordered Weighted Clustering
TLDR
The main result is a construction of a simultaneous coreset of size $O_{\epsilon, d}(k^2 \log^2 |X|)$ for Ordered k-Median, which translates to a massive speedup of clustering computations, while maintaining high accuracy for a range of weights. Expand
On Coresets for k-Median and k-Means Clustering in Metric and Euclidean Spaces and Their Applications
  • K. Chen
  • Mathematics, Computer Science
  • SIAM J. Comput.
  • 2009
TLDR
These are the first streaming algorithms, for those problems, that have space complexity with polynomial dependency on the dimension, using $O(d^2k^2\varepsilon^{-2}\log^8n)$ space. Expand
Efficient approximation schemes for uniform-cost clustering problems in planar graphs
TLDR
A new construction of a "coreset for facilities" for planar graphs that in polynomial time one can compute a subset of facilities of size F_0 with a guarantee that there is a $(1+\epsilon)$-approximate solution contained in $F_0$. Expand
Epsilon-Coresets for Clustering (with Outliers) in Doubling Metrics
TLDR
The first relation between the doubling dimension of M(X, d) and the shattering dimension of the range space induced by the distance d is established and an upper bound of O(ddim(M)⋅ log(1/ε) +log log 1/τ) is proved for the probabilistic shattering dimension for even weighted doubling metrics. Expand
Quick k-Median, k-Center, and Facility Location for Sparse Graphs
  • M. Thorup
  • Computer Science, Mathematics
  • SIAM J. Comput.
  • 2004
We present $\tilde{O}(m)$ time and space constant factor approximation algorithms for the k-median, k-center, and facility location problems with assignment costs being shortest path distances in aExpand
VC-dimension and Erdős-Pósa property
TLDR
The goal is to generalize the proof of Chepoi et?al. (2007) with the unique assumption of bounded distance VC-dimension of neighborhoods, in other words, the set of balls of fixed radius in a graph with bounded distanceVC-dimension has the Erd?s-Posa property. Expand
On coresets for k-means and k-median clustering
TLDR
This paper shows the existence of small coresets for the problems of computing k-median/means clustering for points in low dimension, and improves the fastest known algorithms for (1+ε)-approximate k-means and k- median. Expand
k-Means Clustering of Lines for Big Data
TLDR
It is proved that there is always a weighted subset (called coreset) of $dk^{O(k)}\log (n)/\epsilon^2$ lines in $L$ that approximates the sum of squared distances from £L to any given set of $k$ points, which implies results for a streaming set of lines to machines in one pass. Expand
Coresets for Clustering with Fairness Constraints
TLDR
An approach to clustering with fairness constraints that involve multiple, non-disjoint types, that is also scalable and achieves a speed-up to recent fair clustering algorithms by incorporating the first known coreset construction for theFair clustering problem with thek-median objective. Expand
Shortest Paths in Digraphs of Small Treewidth. Part I: Sequential Algorithms
Abstract. We consider the problem of preprocessing an n -vertex digraph with real edge weights so that subsequent queries for the shortest path or distance between any two vertices can be efficientlyExpand
...
1
2
3
4
5
...