• Corpus ID: 88523087

DBSCAN: Optimal Rates For Density Based Clustering

@article{Wang2017DBSCANOR,
  title={DBSCAN: Optimal Rates For Density Based Clustering},
  author={Daren Wang and Xin Yang Lu and Alessandro Rinaldo},
  journal={arXiv: Statistics Theory},
  year={2017}
}
We study the problem of optimal estimation of the density cluster tree under various assumptions on the underlying density. Building up from the seminal work of Chaudhuri et al. [2014], we formulate a new notion of clustering consistency which is better suited to smooth densities, and derive minimax rates of consistency for cluster tree estimation for Holder smooth densities of arbitrary degree \alpha. We present a computationally efficient, rate optimal cluster tree estimator based on a… 
2 Citations
Change-point Detection for Sparse and Dense Functional Data in General Dimensions
TLDR
The consistency of FSBS for multiple change-point estimation is shown and a sharp localisation error rate is provided, which reveals an interesting phase transition phenomenon depending on the number of functional curves observed and the sampling frequency for each curve.
Faster DBSCAN via subsampled similarity queries
TLDR
An extensive experimental analysis is provided showing that on large datasets, one can subsample as little as $0.1\%$ of the neighborhood graph, leading to as much as over 200x speedup and 250x reduction in RAM consumption compared to scikit-learn's implementation of DBSCAN, while still maintaining competitive clustering performance.

References

SHOWING 1-10 OF 29 REFERENCES
Adaptive Density Level Set Clustering
TLDR
This paper presents a simple algorithm that is able to asymptotically determine the optimal level, that is, the level at which there is the rst split in the cluster tree of the data generating distribution.
Consistent Procedures for Cluster Tree Estimation and Pruning
TLDR
A tree pruning procedure is studied that guarantees, under milder conditions than usual, to remove clusters that are spurious while recovering those that are salient, and derive lower bounds on the sample complexity of cluster tree estimation.
Set estimation: Another bridge between statistics and geometry.
TLDR
A nonexhaustive expository overview of set estimation theory is given, which presents the basic ideas, some typical tools involved in the theory and a few applications.
$U$-Processes: Rates of Convergence
On introduit un nouveau processus stochastique, une collection de statistiques U indicees par une famille de noyaux symetriques. On obtient des conditions pour la convergence presque sure uniforme
On boundary estimation
We consider the problem of estimating the boundary of a compact set S ⊂ ℝ d from a random sample of points taken from S. We use the Devroye-Wise estimator which is a union of balls centred at the
A plug-in approach to support estimation
We suggest a new approach, based on the use of density estimators, for the problem of estimating the (compact) support of a multivariate density. This subject (motivated in terms of pattern analysis
Measuring Mass Concentrations and Estimating Density Contour Clusters-An Excess Mass Approach
By using empirical process theory, the so-called excess mass approach is studied. It can be applied to various statistical problems, especially in higher dimensions, such as testing for
Single linkage clustering and continuum percolation
Suppose f is a probability density function in d dimensions, d >= 2. A single linkage a-cluster on a sample of size n from the density f is a connected component of the union of balls of volume a,
Minimax theory of image reconstruction
Image processing is an increasingly important area of research and there exists a large variety of image reconstruction methods proposed by different authors. This book is concerned with a technique
Smoothing of Multivariate Data: Density Estimation and Visualization
TLDR
Smoothing of Multivariate Data is an excellent book for courses in multivariate analysis, data analysis, and nonparametric statistics at the upper-undergraduate and graduate levels and serves as a valuable reference for practitioners and researchers in the fields of statistics, computer science, economics, and engineering.
...
...