DBSCAN Revisited: Mis-Claim, Un-Fixability, and Approximation
@article{Gan2015DBSCANRM, title={DBSCAN Revisited: Mis-Claim, Un-Fixability, and Approximation}, author={Junhao Gan and Yufei Tao}, journal={Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data}, year={2015} }
DBSCAN is a popular method for clustering multi-dimensional objects. [] Key Result We formalize our findings into the new notion of ρ-<i>approximate</i> DBSCAN, which we believe should replace DBSCAN on big data due to the latter's computational intractability.
Figures and Tables from this paper
165 Citations
On the Hardness and Approximation of Euclidean DBSCAN
- Computer ScienceACM Trans. Database Syst.
- 2017
It is proved that, for d ≥3, the problem of computing DBSCAN clusters from scratch requires ω(n 4/3) time to solve, unless very significant breakthroughs—ones widely believed to be impossible—could be made in theoretical computer science.
DBSCAN Revisited, Revisited
- Computer ScienceACM Trans. Database Syst.
- 2017
In new experiments, it is shown that the new SIGMOD 2015 methods do not appear to offer practical benefits if the DBSCAN parameters are well chosen and thus they are primarily of theoretical interest.
A fast clustering algorithm based on pruning unnecessary distance computations in DBSCAN for high-dimensional data
- Computer SciencePattern Recognit.
- 2018
On Metric DBSCAN with Low Doubling Dimension
- Computer ScienceArXiv
- 2020
This paper considers the metric DBSCAN problem under the assumption that the inliers (excluding the outliers) have a low doubling dimension and applies a novel randomized $k$-center clustering idea to reduce the complexity of range query, which is the most time consuming step in the whole DBS CAN procedure.
KNN-BLOCK DBSCAN: Fast Clustering for Large-Scale Data
- Computer ScienceIEEE Transactions on Systems, Man, and Cybernetics: Systems
- 2021
A simple but fast approximate DBSCAN is proposed based on two findings: 1) the problem of identifying whether a point is a core point or not is, in fact, a kNN problem and 2) a point has a similar density distribution to its neighbors, and neighbor points are highly possible to be the same type (core point, border point, or noise).
DBSCAN++: Towards fast and scalable density clustering
- Computer ScienceICML
- 2019
Surprisingly, up to a certain point, one can enjoy the same estimation rates while lowering computational cost, showing that DBSCAN++ is a sub-quadratic algorithm that attains minimax optimal rates for level-set estimation, a quality that may be of independent interest.
Dynamic Density Based Clustering
- Computer ScienceSIGMOD Conference
- 2017
It is proved that the ρ-approximate version of DBSCAN suffers from the very same hardness when the dataset is fully dynamic, namely, when both insertions and deletions are allowed, and it is shown that this issue goes away as soon as tiny further relaxation is applied, yet still ensuring the same quality---known as the ``sandwich guarantee''---of ρ.
An Efficient Density-based Clustering Algorithm for Higher-Dimensional Data
- Computer ScienceArXiv
- 2018
A novel algorithm named GDPAM is proposed attempting to extend Grid-based DBSCAN to higher data dimension by adopting an efficient union-find algorithm to maintain the clustering information in order to reduce redundancies in the merging.
References
SHOWING 1-10 OF 35 REFERENCES
Approximate range searching
- Computer ScienceSCG '95
- 1995
It is shown that if one is willing to allow approximate ranges, then it is possible to do much better than current state-of-the-art results, and empirical evidence is given showing that allowing small relative errors can significantly improve query execution times.
A faster algorithm for DBSCAN
- Computer Science
- 2013
This master thesis focus on improving the running time of DBSCAN, a density-based clustering algorithm, by introducing a faster algorithm which theoretically runs in O(n log n) time in the worst case and experimentally investigates a simplified version of this algorithm.
SPARCL: Efficient and Effective Shape-Based Clustering
- Computer Science2008 Eighth IEEE International Conference on Data Mining
- 2008
This paper proposes SPARCL, a simple and scalable algorithm for finding clusters with arbitrary shapes and sizes, and it has linear space and time complexity.
A Fast Density-Based Clustering Algorithm for Large Databases
- Computer Science2006 International Conference on Machine Learning and Cybernetics
- 2006
A fast density-based clustering algorithm is presented based on DBSCAN that selects orderly unlabelled points outside a core object's neighborhood as seeds to expand clusters so that the execution frequency of region queries can be decreased.
New lower bounds for Hopcroft's problem
- Computer Science, MathematicsDiscret. Comput. Geom.
- 1996
A combinatorial representation of the relative order type of a set of points and hyperplanes, called amonochromatic cover, is defined, and lower bounds on its size in the worst case are derived, showing that the running time of any partitioning algorithm is bounded below by the size of some monochromatics cover.
On the relative complexities of some geometric problems
- Mathematics, Computer ScienceCCCG
- 1995
This paper considers the relative complexities of a large number of computational geometry problems whose complexities are believed to be roughly (n4=3), and surveys known reductions among problems involving lines in three-space, and among higher dimensional closestpair problems.
A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise
- Computer ScienceKDD
- 1996
DBSCAN, a new clustering algorithm relying on a density-based notion of clusters which is designed to discover clusters of arbitrary shape, is presented which requires only one input parameter and supports the user in determining an appropriate value for it.
STING: A Statistical Information Grid Approach to Spatial Data Mining
- Computer ScienceVLDB
- 1997
The idea is to capture statistical information associated with spatial cells in such a manner that whole classes of queries and clustering problems can be answered without recourse to the individual objects.
Range searching with efficient hierarchical cuttings
- Computer ScienceSCG '92
- 1992
It is shown that multilevel range searching data structures can be built with only a polylogarithmic overhead in space and query time per level (the previous solutions require at least a small fixed power of <italic>n</italic>.
OPTICS: ordering points to identify the clustering structure
- Computer ScienceSIGMOD '99
- 1999
A new algorithm is introduced for the purpose of cluster analysis which does not produce a clustering of a data set explicitly; but instead creates an augmented ordering of the database representing its density-based clustering structure.