• Corpus ID: 218684722

Stable and consistent density-based clustering

@article{Rolle2020StableAC,
  title={Stable and consistent density-based clustering},
  author={Alexander Rolle and Luis Scoccola},
  journal={ArXiv},
  year={2020},
  volume={abs/2005.09048}
}
We present a consistent approach to density-based clustering, which satisfies a stability theorem that holds without any distributional assumptions. We also show that the algorithm can be combined with standard procedures to extract a flat clustering from a hierarchical clustering, and that the resulting flat clustering algorithms satisfy stability theorems. The algorithms and proofs are inspired by topological data analysis. 

Figures from this paper

Stability of 2-Parameter Persistent Homology

It is shown that several related density-sensitive constructions of bifiltrations from data satisfy stability properties accommodating the addition and removal of outliers, and 1-Lipschitz stability results closely analogous to the standard stability results for 1-parameter persistent homology.

The Degree-Rips Complexes of an Annulus with Outliers

The degree-Rips bifiltration is the most computable of the parameter-free, density-sensitive bifiltrations in topological data analysis. It is known that this construction is stable to small

Filtration-Domination in Bifiltered Graphs

An extensive experimental evaluation shows that in most cases, more than 90% of the edges of the graph can be removed, and this often leads to a substantial speedup, and reduction in the memory usage, of the computational pipeline of multiparameter topological data analysis.

Stability for layer points

The theory of layer points is generalized to the more general context of ~v-hierarchical clusterings to consider cases where a hierarchical clustering of a finite metric space, Y, is interleaved with a hierarchical clusters of some sample X ⊆ Y.

Locally Persistent Categories And Metric Properties Of Interleaving Distances

This thesis presents a uniform treatment of different distances used in the applied topology literature. We introduce the notion of a locally persistent category, which is a category with a notion of

An Introduction to Multiparameter Persistence

In topological data analysis (TDA), one often studies the shape of data by constructing a filtered topological space, whose structure is then examined using persistent homology. However, a single

𝓁p-Distances on Multiparameter Persistence Modules

It is shown that on 1or 2-parameter persistence modules over prime fields, dp I is the universal metric satisfying a natural stability property; this result extends a stability result of Skraba and Turner for the p-Wasserstein distance on barcodes in the 1- parameter case, and is also a close analogue of a universality property for the interleaving distance given by the second author.

$\ell^p$-Distances on Multiparameter Persistence Modules

It is shown that on 1or 2-parameter persistence modules over prime fields, dp I is the universal metric satisfying a natural stability property; this result extends a stability result of Skraba and Turner for the p-Wasserstein distance on barcodes in the 1- parameter case, and is also a close analogue of a universality property for the interleaving distance given by the second author.

Characterization of Gromov-type geodesics

Classical results due to Gromov and to Petersen establish that, when endowed with the Gromov-Hausdorff distance dGH, the collection M of all isometry classes of compact metric spaces is a complete

Interleaving by Parts: Join Decompositions of Interleavings and Join-Assemblage of Geodesics

Metrics of interest in topological data analysis (TDA) are often explicitly or implicitly in the form of an interleaving distance d I between poset maps (i.e. order-preserving maps), e.g. the

References

SHOWING 1-10 OF 37 REFERENCES

Density-Based Clustering Based on Hierarchical Density Estimates

This work proposes a theoretically and practically improved density-based, hierarchical clustering method, providing a clustering hierarchy from which a simplified tree of significant clusters can be constructed, and proposes a novel cluster stability measure.

Generalized density clustering

We study generalized density-based clustering in which sharply defined clusters such as clusters on lower-dimensional manifolds are allowed. We show that accurate clustering is possible even in high

Characterization, Stability and Convergence of Hierarchical Clustering Methods

It is shown that within this framework, one can prove a theorem analogous to one of Kleinberg (2002), in which one obtains an existence and uniqueness theorem instead of a non-existence result.

Stability of density-based clustering

This paper defines two notions of instability to measure the variability of L(λ) and T as a function of h, and investigates the theoretical properties of these instability measures.

Accelerated Hierarchical Density Based Clustering

  • Leland McInnesJohn Healy
  • Computer Science, Physics
    2017 IEEE International Conference on Data Mining Workshops (ICDMW)
  • 2017
The accelerated HDBSCAN* algorithm provides comparable performance to DBSCAN, while supporting variable density clusters, and eliminating the need for the difficult to tune distance scale parameter epsilon, making it the default choice for density based clustering.

Rates of convergence for the cluster tree

Finite-sample convergence rates for the algorithm and lower bounds on the sample complexity of this estimation problem are given.

Persistence-Based Clustering in Riemannian Manifolds

A clustering scheme that combines a mode-seeking phase with a cluster merging phase in the corresponding density map, and whose output clusters have the property that their spatial locations are bound to the ones of the basins of attraction of the peaks of the density.

Multiparameter Hierarchical Clustering Methods

This work proposes an extension of hierarchical clustering methods, called multiparameter hierarchical clustered methods which are designed to exhibit sensitivity to density while retaining desirable theoretical properties, and presents both a characterization and a stability theorem.

Consistency of Single Linkage for High-Density Clusters

Abstract High-density clusters are defined on a population with density f in r dimensions to be the maximal connected sets of form {x | f(x) ≥ c}. Single-linkage clustering is evaluated for

A Generalized Single Linkage Method for Estimating the Cluster Tree of a Density

A graph-based method is presented that can approximate the cluster tree of any density estimate and proposes excess mass as a measure for the size of a branch, reflecting the height of the corresponding peak of the density above the surrounding valley floor as well as its spatial extent.