Hierarchical Density Estimates for Data Clustering, Visualization, and Outlier Detection

  title={Hierarchical Density Estimates for Data Clustering, Visualization, and Outlier Detection},
  author={Ricardo J. G. B. Campello and Davoud Moulavi and Arthur Zimek and J{\"o}rg Sander},
  journal={ACM Transactions on Knowledge Discovery from Data (TKDD)},
  pages={1 - 51}
An integrated framework for density-based cluster analysis, outlier detection, and data visualization is introduced in this article. The main module consists of an algorithm to compute hierarchical estimates of the level sets of a density, following Hartigan’s classic model of density-contour clusters and trees. Such an algorithm generalizes and improves existing density-based clustering techniques with respect to different aspects. It provides as a result a complete clustering hierarchy… 

Density-based Clustering using Automatic Density Peak Detection

Experiments on several synthetic and real-world datasets show the superiority of the proposed method in centroid identification from the datasets with various distributions and dimensionalities and that it can be effectively applied to image segmentation.

A Modularity-Based Measure for Cluster Selection from Clustering Hierarchies

An adaptation of a variant of the Modularity Q measure is proposed so that it can be applied as an optimization criterion to the problem of optimal local cuts through clustering hierarchies, and the results suggest that the proposed measure is a competitive alternative, especially for high-dimensional data.

An efficient density-based clustering for multi-dimensional database

A density-based clustering algorithm that adopts a divide-and-conquer strategy and presents a way to automatically determine the grid cell width, showing that the proposed algorithm efficiently finds accurate clusters in both low-dimensional and multi-dimensional databases.

Linear density-based clustering with a discrete density model

Experimental results prove the efficiency and the validity of this approach over DBSCAN in the context of spatial data clustering, enabling the use of a density-based clustering technique on large datasets with low computational cost.

Efficient Computation of Multiple Density-Based Clustering Hierarchies

This paper proposes an efficient approach to compute all HDBSCAN* hierarchies for a range of mpts by replacing the graph used by HDBS CAN with a much smaller graph that is guaranteed to contain the required information.

Improving the Grid-based Clustering by Identifying Cluster Center Nodes and Boundary Nodes Adaptively

An adaptive grid-based clustering method is proposed, in which the definition of cluster center nodes and boundary nodes is based on relative density values between data points, without using a global threshold.

A Multi-Density Clustering Algorithm Based on Similarity for Dataset With Density Variation

This paper proposes algorithm DENSS, which performs clustering on the basis of the similarity of neighbour distribution and the number of shared neighbors for two objects, and is remarkably superior to seven clustering algorithms.



Design and Implementation of Scalable Hierarchical Density Based Clustering

This thesis introduces Partitioned HDS that provides significant reduction in time and space complexity and makes it possible to generate the Auto-HDS cluster hierarchy on much larger datasets with 100s of millions of data points and describes Parallel Auto-hDS that takes advantage of the inherent parallelism available in Partitioning Auto- HDS to scale to even larger datasets without a corresponding increase in actual run time.

A Generalized Single Linkage Method for Estimating the Cluster Tree of a Density

A graph-based method is presented that can approximate the cluster tree of any density estimate and proposes excess mass as a measure for the size of a branch, reflecting the height of the corresponding peak of the density above the surrounding valley floor as well as its spatial extent.

OPTICS: ordering points to identify the clustering structure

A new algorithm is introduced for the purpose of cluster analysis which does not produce a clustering of a data set explicitly; but instead creates an augmented ordering of the database representing its density-based clustering structure.

Automated Hierarchical Density Shaving: A Robust Automated Clustering and Visualization Framework for Large Biological Data Sets

Automated Hierarchical Density Shaving (Auto-HDS), a framework that consists of a fast hierarchical density-based clustering algorithm and an unsupervised model selection strategy, that can automatically select clusters of different densities, present them in a compact hierarchy, and rank individual clusters using an innovative stability criteria.

A New Shared Nearest Neighbor Clustering Algorithm and its Applications

This paper offers definitions of density and similarity that work well for high dimensional data (actually, for data of any dimensionality), and uses a similarity measure that is based on the number of neighbors that two points share, and defines the density of a point as the sum of the similarities of a points’s nearest neighbors.

Density-Based Clustering Based on Hierarchical Density Estimates

This work proposes a theoretically and practically improved density-based, hierarchical clustering method, providing a clustering hierarchy from which a simplified tree of significant clusters can be constructed, and proposes a novel cluster stability measure.

Density-based semi-supervised clustering

This study proposes a semi-supervised density-based clustering method, which enhances the algorithm DBSCAN with Must-Link and Cannot-Link constraints, and shows that this approach improves the performance of the algorithm.

Automatic Extraction of Clusters from Hierarchical Clustering Representations

This paper investigates the relation between dendrograms and reachability plots and introduces methods to convert them into each other showing that they essentially contain the same information, and introduces a technique that automatically determines the significant clusters in a hierarchical cluster representation.

Semi-supervised Density-Based Clustering

This work describes how labeled objects can be used to help the algorithm detecting suitable density parameters for the algorithm to extract density-based clusters in specific parts of the feature space.