DBSCAN Revisited, Revisited

@article{Schubert2017DBSCANRR,
  title={DBSCAN Revisited, Revisited},
  author={Erich Schubert and J{\"o}rg Sander and Martin Ester and Hans-Peter Kriegel and Xiaowei Xu},
  journal={ACM Transactions on Database Systems (TODS)},
  year={2017},
  volume={42},
  pages={1 - 21}
}
At SIGMOD 2015, an article was presented with the title “DBSCAN Revisited: Mis-Claim, Un-Fixability, and Approximation” that won the conference’s best paper award. In this technical correspondence, we want to point out some inaccuracies in the way DBSCAN was represented, and why the criticism should have been directed at the assumption about the performance of spatial index structures such as R-trees and not at an algorithm that can use such indexes. We will also discuss the relationship of… 

On Metric DBSCAN with Low Doubling Dimension

TLDR
This paper considers the metric DBSCAN problem under the assumption that the inliers (excluding the outliers) have a low doubling dimension, and applies a novel randomized k-center clustering idea to reduce the complexity of range query.

An Efficient Density-based Clustering Algorithm for Higher-Dimensional Data

TLDR
A novel algorithm named GDPAM is proposed attempting to extend Grid-based DBSCAN to higher data dimension by adopting an efficient union-find algorithm to maintain the clustering information in order to reduce redundancies in the merging.

Theoretically-Efficient and Practical Parallel DBSCAN

TLDR
This paper presents new parallel algorithms for Euclidean exact DBSCAN and approximate DBS CAN that match the work bounds of their sequential counterparts, and are highly parallel (polylogarithmic depth).

Comparative Analysis Review of Pioneering DBSCAN and Successive Density-Based Clustering Algorithms

TLDR
The implementation, features, strengths, and drawbacks of the DBSCAN are thoroughly examined, and the successive algorithms proposed to provide improvement on the original DBS CAN are classified based on their motivations and are discussed.

Anytime parallel density-based clustering

TLDR
This paper proposes a novel anytime approach, called AnyDBC, that compresses the data into smaller density-connected subsets called primitive clusters and labels objects based on connected components of these primitive clusters to reduce the label propagation time of DBSCAN.

Scaling Density-Based Clustering to Large Collections of Sets

TLDR
A new, density-based clustering algorithm that processes data points in any user-defined order and does not need to materialize neighborhoods is proposed, and is the first DBSCAN-compliant algorithm that can leverage asymmetric indexes in linear space.

DBSVEC: Density-Based Clustering Using Support Vector Expansion

TLDR
DBSVEC introduces support vectors into density-based clustering, which allows performing range queries only on a small subset of points called the core support vectors, which significantly improves the efficiency while retaining high-quality cluster results.
...

References

SHOWING 1-10 OF 48 REFERENCES

AnyDBC: An Efficient Anytime Density-based Clustering Algorithm for Very Large Complex Datasets

TLDR
A novel anytime approach to cope with the problem of reducing both the range query and the label propagation time of DBSCAN by compressing the data into smaller density-connected subsets called primitive clusters and labels objects based on connected components of these primitive clusters for reducing the label propagate time.

A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise

TLDR
DBSCAN, a new clustering algorithm relying on a density-based notion of clusters which is designed to discover clusters of arbitrary shape, is presented which requires only one input parameter and supports the user in determining an appropriate value for it.

A faster algorithm for DBSCAN

TLDR
This master thesis focus on improving the running time of DBSCAN, a density-based clustering algorithm, by introducing a faster algorithm which theoretically runs in O(n log n) time in the worst case and experimentally investigates a simplified version of this algorithm.

When Is ''Nearest Neighbor'' Meaningful?

TLDR
The effect of dimensionality on the "nearest neighbor" problem is explored, and it is shown that under a broad set of conditions, as dimensionality increases, the Distance to the nearest data point approaches the distance to the farthest data point.

The (black) art of runtime evaluation: Are we comparing algorithms or implementations?

TLDR
This work substantiates its points with extensive experiments, using clustering and outlier detection methods with and without index acceleration, and discusses what one can learn from evaluations, whether experiments are properly designed, and what kind of conclusions one should avoid.

A note on the nearest neighbor in growth-restricted metrics

TLDR
This paper gives results relevant to sequential and distributed dynamic data structures for finding nearest neighbors in growth-restricted metrics and improves on the time bound of a load-balanced version of algorithm (for dynamic networks) presented in [3].

STING: A Statistical Information Grid Approach to Spatial Data Mining

TLDR
The idea is to capture statistical information associated with spatial cells in such a manner that whole classes of queries and clustering problems can be answered without recourse to the individual objects.

A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces

TLDR
It is shown formally that partitioning and clustering techniques for similarity search in HDVSs exhibit linear complexity at high dimensionality, and that existing methods are outperformed on average by a simple sequential scan if the number of dimensions exceeds around 10.

Using grid for accelerating density-based clustering

  • Shaaban MahranK. Mahar
  • Computer Science
    2008 8th IEEE International Conference on Computer and Information Technology
  • 2008
TLDR
A new algorithm GriDBSCAN is introduced to enhance the performance of DBSCAN using grid partitioning and merging, yielding a high performance with the advantage of high degree of parallelism.

R-trees Have Grown Everywhere

TLDR
An extensive survey of the R-tree evolution is provided, studying the applicability of the structure and its variations to efficient query processing, accurate proposed cost models, and implementation issues like concurrency control and parallelism.