A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise
- M. Ester, H. Kriegel, J. Sander, Xiaowei Xu
- Computer ScienceKnowledge Discovery and Data Mining
- 2 August 1996
DBSCAN, a new clustering algorithm relying on a density-based notion of clusters which is designed to discover clusters of arbitrary shape, is presented which requires only one input parameter and supports the user in determining an appropriate value for it.
LOF: identifying density-based local outliers
This paper contends that for many scenarios, it is more meaningful to assign to each object a degree of being an outlier, called the local outlier factor (LOF), and gives a detailed formal analysis showing that LOF enjoys many desirable properties.
OPTICS: ordering points to identify the clustering structure
A new algorithm is introduced for the purpose of cluster analysis which does not produce a clustering of a data set explicitly; but instead creates an augmented ordering of the database representing its density-based clustering structure.
Density-Based Clustering Based on Hierarchical Density Estimates
- R. Campello, D. Moulavi, J. Sander
- Computer SciencePacific-Asia Conference on Knowledge Discovery…
- 14 April 2013
This work proposes a theoretically and practically improved density-based, hierarchical clustering method, providing a clustering hierarchy from which a simplified tree of significant clusters can be constructed, and proposes a novel cluster stability measure.
Density-Based Clustering in Spatial Databases: The Algorithm GDBSCAN and Its Applications
- J. Sander, M. Ester, H. Kriegel, Xiaowei Xu
- Computer ScienceData mining and knowledge discovery
- 1 June 1998
The generalized algorithm DBSCAN can cluster point objects as well as spatially extended objects according to both, their spatial and their nonspatial attributes, and four applications using 2D points (astronomy, 3D points,biology, 5D points and 2D polygons) are presented, demonstrating the applicability of GDBSCAN to real-world problems.
Incremental Clustering for Mining in a Data Warehousing Environment
- M. Ester, H. Kriegel, J. Sander, M. Wimmer, Xiaowei Xu
- Computer ScienceVery Large Data Bases Conference
- 24 August 1998
It can be proven that the incremental algorithm yields the same result as DBSCAN, which is applicable to any database containing data from a metric space, e.g., to a spatial database or to a WWW-log database.
Hierarchical Density Estimates for Data Clustering, Visualization, and Outlier Detection
- R. Campello, D. Moulavi, A. Zimek, J. Sander
- Computer ScienceACM Transactions on Knowledge Discovery from Data
- 22 July 2015
An integrated framework for density-based cluster analysis, outlier detection, and data visualization is introduced, consisting of an algorithm to compute hierarchical estimates of the level sets of a density, following Hartigan’s classic model of density-contour clusters and trees.
On the evaluation of unsupervised outlier detection: measures, datasets, and an empirical study
An extensive experimental study on the performance of a representative set of standard k nearest neighborhood-based methods for unsupervised outlier detection, across a wide variety of datasets prepared for this purpose, and provides a characterization of the datasets themselves.
DBSCAN Revisited, Revisited
- Erich Schubert, J. Sander, M. Ester, H. Kriegel, Xiaowei Xu
- Computer ScienceACM Transactions on Database Systems
- 31 July 2017
In new experiments, it is shown that the new SIGMOD 2015 methods do not appear to offer practical benefits if the DBSCAN parameters are well chosen and thus they are primarily of theoretical interest.
Finding non-redundant, statistically significant regions in high dimensional data: a novel approach to projected and subspace clustering
This work proposes a novel problem formulation that aims at extracting axis-parallel regions that stand out in the data in a statistical sense and proposes the approximation algorithm STATPC, which significantly outperforms existing projected and subspace clustering algorithms in terms of accuracy.