• Corpus ID: 218684722

Stable and consistent density-based clustering

@article{Rolle2020StableAC,
  title={Stable and consistent density-based clustering},
  author={Alexander Rolle and Luis Scoccola},
  journal={ArXiv},
  year={2020},
  volume={abs/2005.09048}
}
We present a consistent approach to density-based clustering, which satisfies a stability theorem that holds without any distributional assumptions. We also show that the algorithm can be combined with standard procedures to extract a flat clustering from a hierarchical clustering, and that the resulting flat clustering algorithms satisfy stability theorems. The algorithms and proofs are inspired by topological data analysis. 

Figures from this paper

Flattening Multiparameter Hierarchical Clustering Functors

TLDR
This work brings together topological data analysis, applied category theory, and machine learning to study multiparameter hierarchical clustering and introduces a Bayesian update algorithm for learning clustering parameters from data.

Stability of 2-Parameter Persistent Homology

TLDR
It is shown that several related density-sensitive constructions of bifiltrations from data satisfy stability properties accommodating the addition and removal of outliers, and 1-Lipschitz stability results closely analogous to the standard stability results for 1-parameter persistent homology.

Stability for layer points

TLDR
The theory of layer points is generalized to the more general context of ~v-hierarchical clusterings to consider cases where a hierarchical clustering of a finite metric space, Y, is interleaved with a hierarchical clusters of some sample X ⊆ Y.

Locally Persistent Categories And Metric Properties Of Interleaving Distances

This thesis presents a uniform treatment of different distances used in the applied topology literature. We introduce the notion of a locally persistent category, which is a category with a notion of

An Introduction to Multiparameter Persistence

In topological data analysis (TDA), one often studies the shape of data by constructing a filtered topological space, whose structure is then examined using persistent homology. However, a single

𝓁p-Distances on Multiparameter Persistence Modules

TLDR
It is shown that on 1or 2-parameter persistence modules over prime fields, dp I is the universal metric satisfying a natural stability property; this result extends a stability result of Skraba and Turner for the p-Wasserstein distance on barcodes in the 1- parameter case, and is also a close analogue of a universality property for the interleaving distance given by the second author.

$\ell^p$-Distances on Multiparameter Persistence Modules

TLDR
It is shown that on 1or 2-parameter persistence modules over prime fields, dp I is the universal metric satisfying a natural stability property; this result extends a stability result of Skraba and Turner for the p-Wasserstein distance on barcodes in the 1- parameter case, and is also a close analogue of a universality property for the interleaving distance given by the second author.

Characterization of Gromov-type geodesics

Classical results due to Gromov and to Petersen establish that, when endowed with the Gromov-Hausdorff distance dGH, the collection M of all isometry classes of compact metric spaces is a complete

Interleaving by Parts: Join Decompositions of Interleavings and Join-Assemblage of Geodesics

Metrics of interest in topological data analysis (TDA) are often explicitly or implicitly in the form of an interleaving distance d I between poset maps (i.e. order-preserving maps), e.g. the

Rectification of interleavings and a persistent Whitehead theorem

The homotopy interleaving distance, a distance between persistent spaces, was introduced by Blumberg and Lesnick and shown to be universal, in the sense that it is the largest homotopy-invariant

References

SHOWING 1-10 OF 37 REFERENCES

Density-Based Clustering Based on Hierarchical Density Estimates

TLDR
This work proposes a theoretically and practically improved density-based, hierarchical clustering method, providing a clustering hierarchy from which a simplified tree of significant clusters can be constructed, and proposes a novel cluster stability measure.

Generalized density clustering

We study generalized density-based clustering in which sharply defined clusters such as clusters on lower-dimensional manifolds are allowed. We show that accurate clustering is possible even in high

Characterization, Stability and Convergence of Hierarchical Clustering Methods

TLDR
It is shown that within this framework, one can prove a theorem analogous to one of Kleinberg (2002), in which one obtains an existence and uniqueness theorem instead of a non-existence result.

Accelerated Hierarchical Density Based Clustering

  • Leland McInnesJohn Healy
  • Computer Science, Physics
    2017 IEEE International Conference on Data Mining Workshops (ICDMW)
  • 2017
TLDR
The accelerated HDBSCAN* algorithm provides comparable performance to DBSCAN, while supporting variable density clusters, and eliminating the need for the difficult to tune distance scale parameter epsilon, making it the default choice for density based clustering.

Rates of convergence for the cluster tree

TLDR
Finite-sample convergence rates for the algorithm and lower bounds on the sample complexity of this estimation problem are given.

Persistence-Based Clustering in Riemannian Manifolds

TLDR
A clustering scheme that combines a mode-seeking phase with a cluster merging phase in the corresponding density map, and whose output clusters have the property that their spatial locations are bound to the ones of the basins of attraction of the peaks of the density.

Multiparameter Hierarchical Clustering Methods

TLDR
This work proposes an extension of hierarchical clustering methods, called multiparameter hierarchical clustered methods which are designed to exhibit sensitivity to density while retaining desirable theoretical properties, and presents both a characterization and a stability theorem.

Consistency of Single Linkage for High-Density Clusters

Abstract High-density clusters are defined on a population with density f in r dimensions to be the maximal connected sets of form {x | f(x) ≥ c}. Single-linkage clustering is evaluated for

A Generalized Single Linkage Method for Estimating the Cluster Tree of a Density

TLDR
A graph-based method is presented that can approximate the cluster tree of any density estimate and proposes excess mass as a measure for the size of a branch, reflecting the height of the corresponding peak of the density above the surrounding valley floor as well as its spatial extent.

Beyond Hartigan Consistency: Merge Distortion Metric for Hierarchical Clustering

TLDR
Two limit properties, separation and minimality, are identified, which address both over-segmentation and improper nesting and together imply (but are not implied by) Hartigan consistency.