# hdbscan: Hierarchical density based clustering

@article{McInnes2017hdbscanHD, title={hdbscan: Hierarchical density based clustering}, author={Leland McInnes and John Healy and S. Astels}, journal={J. Open Source Softw.}, year={2017}, volume={2}, pages={205} }

HDBSCAN: Hierarchical Density-Based Spatial Clustering of Applications with Noise (Campello, Moulavi, and Sander 2013), (Campello et al. 2015. [...] Key Method The library also includes support for Robust Single Linkage clustering (Chaudhuri et al. 2014), (Chaudhuri and Dasgupta 2010), GLOSH outlier detection (Campello et al. 2015), and tools for visualizing and exploring cluster structures. Finally support for prediction and soft clustering is also available. Expand

#### Topics from this paper

#### 428 Citations

HDBSCAN(ε̂): An Alternative Cluster Extraction Method for HDBSCAN

- Computer Science
- ArXiv
- 2019

This work proposes an alternative method for selecting clusters from the HDBSCAN hierarchy that is particularly useful for data sets with variable densities where they require a low minimum cluster size but want to avoid an abundance of micro-clusters in high-density regions. Expand

HDBSCAN($\hat{\epsilon}$): An Alternative Cluster Extraction Method for HDBSCAN

- Computer Science
- 2019

This work proposes an alternative method for selecting clusters from the HDBSCAN hierarchy that uses an additional input parameter $\hat{\epsilon$ and acts like a hybrid between DBSCAN* andHDBSCAN. Expand

FISHDBC: Flexible, Incremental, Scalable, Hierarchical Density-Based Clustering for Arbitrary Data and Distance

- Computer Science, Mathematics
- ArXiv
- 2019

FISHDBC is a flexible, incremental, scalable, and hierarchical density-based clustering algorithm that approximates HDBSCAN*, an evolution of DBSCAN. Expand

DenMune: Density peak based clustering using mutual nearest neighbors

- Computer Science
- Pattern Recognit.
- 2021

A novel clustering algorithm based on identifying dense regions using mutual nearest neighborhoods of size K, where K is the only parameter required from the user, besides obeying the mutual nearest neighbor consistency principle, that produces robust results on various low and high dimensional datasets relative to several known state of the art clustering algorithms. Expand

Accelerated Hierarchical Density Based Clustering

- Mathematics, Computer Science
- 2017 IEEE International Conference on Data Mining Workshops (ICDMW)
- 2017

The accelerated HDBSCAN* algorithm provides comparable performance to DBSCAN, while supporting variable density clusters, and eliminating the need for the difficult to tune distance scale parameter epsilon, making it the default choice for density based clustering. Expand

Chameleon 2

- Computer Science
- ACM Trans. Knowl. Discov. Data
- 2019

This work proposes an improved graph-based clustering algorithm called Chameleon 2, which overcomes several drawbacks of state-of-the-art clustering approaches, and modified the internal cluster quality measure and added an extra step to ensure algorithm robustness. Expand

Condorcet Optimal Clustering with Delaunay Triangulation: Climate Zones and World Happiness Insights

- Computer Science
- SBP-BRiMS
- 2019

A novel modification toCondorcet clustering methods is proposed, which improves it significantly on both accounts and works particularly well when applied to social network type data sets. Expand

Finding landmarks within settled areas using hierarchical density-based clustering and meta-data from publicly available images

- Computer Science
- Expert Syst. Appl.
- 2019

Two novel density-based clustering algorithms that can be applied to solve the process of determining relevant landmarks within a certain region are presented: K-DBSCAN, a clustering algorithm based on Gaussian Kernels used to detect individual inhabited cores within regions; and V-D BSCAN, an hierarchical algorithm suitable for sample spaces with variable density, which is used to attempt the discovery of relevant landmarks in cities or regions. Expand

Clustering tendency assessment for datasets having inter-cluster density variations

- Computer Science
- 2020 International Conference on Signal Processing and Communications (SPCOM)
- 2020

Numerical experiments comparing the proposed novel approach with baseline VAT/iVAT as well as spectral clustering and density-based clustering algorithms establish that LS-VAT and LS- iVAT are superior to the comparable algorithms in terms of clustering quality. Expand

AMTICS: Aligning Micro-clusters to Identify Cluster Structures

- Computer Science
- DASFAA
- 2020

AMTICS is developed as a novel and efficient divide-and-conquer approach to pre-cluster data in distributed instances and align the results in a hierarchy afterward. Expand

#### References

SHOWING 1-5 OF 5 REFERENCES

Hierarchical Density Estimates for Data Clustering, Visualization, and Outlier Detection

- Computer Science, Mathematics
- ACM Trans. Knowl. Discov. Data
- 2015

An integrated framework for density-based cluster analysis, outlier detection, and data visualization is introduced, consisting of an algorithm to compute hierarchical estimates of the level sets of a density, following Hartigan’s classic model of density-contour clusters and trees. Expand

Density-Based Clustering Based on Hierarchical Density Estimates

- Mathematics, Computer Science
- PAKDD
- 2013

This work proposes a theoretically and practically improved density-based, hierarchical clustering method, providing a clustering hierarchy from which a simplified tree of significant clusters can be constructed, and proposes a novel cluster stability measure. Expand

Consistent Procedures for Cluster Tree Estimation and Pruning

- Mathematics, Computer Science
- IEEE Transactions on Information Theory
- 2014

A tree pruning procedure is studied that guarantees, under milder conditions than usual, to remove clusters that are spurious while recovering those that are salient, and derive lower bounds on the sample complexity of cluster tree estimation. Expand

Rates of convergence for the cluster tree

- Computer Science, Mathematics
- NIPS
- 2010

Finite-sample convergence rates for the algorithm and lower bounds on the sample complexity of this estimation problem are given. Expand

hdbscan: Hierarchical density based clustering

- Computer Science
- J. Open Source Softw.
- 2017

HDBSCAN performs DBSCAN over varying epsilon values and integrates the result to find a clustering that gives the best stability over ePSilon, which allows HDBSCAN to find clusters of varying densities, and be more robust to parameter selection. Expand