# hdbscan: Hierarchical density based clustering

@article{McInnes2017hdbscanHD,
title={hdbscan: Hierarchical density based clustering},
author={Leland McInnes and John Healy and S. Astels},
journal={J. Open Source Softw.},
year={2017},
volume={2},
pages={205}
}
• Published 2017
• Computer Science
• J. Open Source Softw.
HDBSCAN: Hierarchical Density-Based Spatial Clustering of Applications with Noise (Campello, Moulavi, and Sander 2013), (Campello et al. 2015. [...] Key Method The library also includes support for Robust Single Linkage clustering (Chaudhuri et al. 2014), (Chaudhuri and Dasgupta 2010), GLOSH outlier detection (Campello et al. 2015), and tools for visualizing and exploring cluster structures. Finally support for prediction and soft clustering is also available.Expand
428 Citations

#### Topics from this paper

HDBSCAN(ε̂): An Alternative Cluster Extraction Method for HDBSCAN
• Computer Science
• ArXiv
• 2019
This work proposes an alternative method for selecting clusters from the HDBSCAN hierarchy that is particularly useful for data sets with variable densities where they require a low minimum cluster size but want to avoid an abundance of micro-clusters in high-density regions. Expand
HDBSCAN($\hat{\epsilon}$): An Alternative Cluster Extraction Method for HDBSCAN
• Computer Science
• 2019
This work proposes an alternative method for selecting clusters from the HDBSCAN hierarchy that uses an additional input parameter $\hat{\epsilon$ and acts like a hybrid between DBSCAN* andHDBSCAN. Expand
FISHDBC: Flexible, Incremental, Scalable, Hierarchical Density-Based Clustering for Arbitrary Data and Distance
FISHDBC is a flexible, incremental, scalable, and hierarchical density-based clustering algorithm that approximates HDBSCAN*, an evolution of DBSCAN. Expand
DenMune: Density peak based clustering using mutual nearest neighbors
• Computer Science
• Pattern Recognit.
• 2021
A novel clustering algorithm based on identifying dense regions using mutual nearest neighborhoods of size K, where K is the only parameter required from the user, besides obeying the mutual nearest neighbor consistency principle, that produces robust results on various low and high dimensional datasets relative to several known state of the art clustering algorithms. Expand
Accelerated Hierarchical Density Based Clustering
• Mathematics, Computer Science
• 2017 IEEE International Conference on Data Mining Workshops (ICDMW)
• 2017
The accelerated HDBSCAN* algorithm provides comparable performance to DBSCAN, while supporting variable density clusters, and eliminating the need for the difficult to tune distance scale parameter epsilon, making it the default choice for density based clustering. Expand
Chameleon 2
• Computer Science
• ACM Trans. Knowl. Discov. Data
• 2019
This work proposes an improved graph-based clustering algorithm called Chameleon 2, which overcomes several drawbacks of state-of-the-art clustering approaches, and modified the internal cluster quality measure and added an extra step to ensure algorithm robustness. Expand
Condorcet Optimal Clustering with Delaunay Triangulation: Climate Zones and World Happiness Insights
A novel modification toCondorcet clustering methods is proposed, which improves it significantly on both accounts and works particularly well when applied to social network type data sets. Expand
Finding landmarks within settled areas using hierarchical density-based clustering and meta-data from publicly available images
• Computer Science
• Expert Syst. Appl.
• 2019
Two novel density-based clustering algorithms that can be applied to solve the process of determining relevant landmarks within a certain region are presented: K-DBSCAN, a clustering algorithm based on Gaussian Kernels used to detect individual inhabited cores within regions; and V-D BSCAN, an hierarchical algorithm suitable for sample spaces with variable density, which is used to attempt the discovery of relevant landmarks in cities or regions. Expand
Clustering tendency assessment for datasets having inter-cluster density variations
• Computer Science
• 2020 International Conference on Signal Processing and Communications (SPCOM)
• 2020
Numerical experiments comparing the proposed novel approach with baseline VAT/iVAT as well as spectral clustering and density-based clustering algorithms establish that LS-VAT and LS- iVAT are superior to the comparable algorithms in terms of clustering quality. Expand
AMTICS: Aligning Micro-clusters to Identify Cluster Structures
• Computer Science
• DASFAA
• 2020
AMTICS is developed as a novel and efficient divide-and-conquer approach to pre-cluster data in distributed instances and align the results in a hierarchy afterward. Expand

#### References

SHOWING 1-5 OF 5 REFERENCES
Hierarchical Density Estimates for Data Clustering, Visualization, and Outlier Detection
• Computer Science, Mathematics
• ACM Trans. Knowl. Discov. Data
• 2015
An integrated framework for density-based cluster analysis, outlier detection, and data visualization is introduced, consisting of an algorithm to compute hierarchical estimates of the level sets of a density, following Hartigan’s classic model of density-contour clusters and trees. Expand
Density-Based Clustering Based on Hierarchical Density Estimates
• Mathematics, Computer Science
• PAKDD
• 2013
This work proposes a theoretically and practically improved density-based, hierarchical clustering method, providing a clustering hierarchy from which a simplified tree of significant clusters can be constructed, and proposes a novel cluster stability measure. Expand
Consistent Procedures for Cluster Tree Estimation and Pruning
• Mathematics, Computer Science
• IEEE Transactions on Information Theory
• 2014
A tree pruning procedure is studied that guarantees, under milder conditions than usual, to remove clusters that are spurious while recovering those that are salient, and derive lower bounds on the sample complexity of cluster tree estimation. Expand
Rates of convergence for the cluster tree
• Computer Science, Mathematics
• NIPS
• 2010
Finite-sample convergence rates for the algorithm and lower bounds on the sample complexity of this estimation problem are given. Expand
hdbscan: Hierarchical density based clustering
• Computer Science
• J. Open Source Softw.
• 2017
HDBSCAN performs DBSCAN over varying epsilon values and integrates the result to find a clustering that gives the best stability over ePSilon, which allows HDBSCAN to find clusters of varying densities, and be more robust to parameter selection. Expand