DBSCAN Revisited: Mis-Claim, Un-Fixability, and Approximation

@article{Gan2015DBSCANRM,
  title={DBSCAN Revisited: Mis-Claim, Un-Fixability, and Approximation},
  author={Junhao Gan and Yufei Tao},
  journal={Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data},
  year={2015}
}
  • Junhao Gan, Yufei Tao
  • Published 27 May 2015
  • Computer Science
  • Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data
DBSCAN is a popular method for clustering multi-dimensional objects. [] Key Result We formalize our findings into the new notion of ρ-<i>approximate</i> DBSCAN, which we believe should replace DBSCAN on big data due to the latter's computational intractability.
On the Hardness and Approximation of Euclidean DBSCAN
TLDR
It is proved that, for d ≥3, the problem of computing DBSCAN clusters from scratch requires ω(n 4/3) time to solve, unless very significant breakthroughs—ones widely believed to be impossible—could be made in theoretical computer science.
DBSCAN Revisited, Revisited
TLDR
In new experiments, it is shown that the new SIGMOD 2015 methods do not appear to offer practical benefits if the DBSCAN parameters are well chosen and thus they are primarily of theoretical interest.
On Metric DBSCAN with Low Doubling Dimension
TLDR
This paper considers the metric DBSCAN problem under the assumption that the inliers (excluding the outliers) have a low doubling dimension and applies a novel randomized $k$-center clustering idea to reduce the complexity of range query, which is the most time consuming step in the whole DBS CAN procedure.
KNN-BLOCK DBSCAN: Fast Clustering for Large-Scale Data
TLDR
A simple but fast approximate DBSCAN is proposed based on two findings: 1) the problem of identifying whether a point is a core point or not is, in fact, a kNN problem and 2) a point has a similar density distribution to its neighbors, and neighbor points are highly possible to be the same type (core point, border point, or noise).
DBSCAN++: Towards fast and scalable density clustering
TLDR
Surprisingly, up to a certain point, one can enjoy the same estimation rates while lowering computational cost, showing that DBSCAN++ is a sub-quadratic algorithm that attains minimax optimal rates for level-set estimation, a quality that may be of independent interest.
Dynamic Density Based Clustering
TLDR
It is proved that the ρ-approximate version of DBSCAN suffers from the very same hardness when the dataset is fully dynamic, namely, when both insertions and deletions are allowed, and it is shown that this issue goes away as soon as tiny further relaxation is applied, yet still ensuring the same quality---known as the ``sandwich guarantee''---of ρ.
An Efficient Density-based Clustering Algorithm for Higher-Dimensional Data
TLDR
A novel algorithm named GDPAM is proposed attempting to extend Grid-based DBSCAN to higher data dimension by adopting an efficient union-find algorithm to maintain the clustering information in order to reduce redundancies in the merging.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 35 REFERENCES
Approximate range searching
TLDR
It is shown that if one is willing to allow approximate ranges, then it is possible to do much better than current state-of-the-art results, and empirical evidence is given showing that allowing small relative errors can significantly improve query execution times.
A faster algorithm for DBSCAN
TLDR
This master thesis focus on improving the running time of DBSCAN, a density-based clustering algorithm, by introducing a faster algorithm which theoretically runs in O(n log n) time in the worst case and experimentally investigates a simplified version of this algorithm.
SPARCL: Efficient and Effective Shape-Based Clustering
TLDR
This paper proposes SPARCL, a simple and scalable algorithm for finding clusters with arbitrary shapes and sizes, and it has linear space and time complexity.
A Fast Density-Based Clustering Algorithm for Large Databases
  • Bing Liu
  • Computer Science
    2006 International Conference on Machine Learning and Cybernetics
  • 2006
TLDR
A fast density-based clustering algorithm is presented based on DBSCAN that selects orderly unlabelled points outside a core object's neighborhood as seeds to expand clusters so that the execution frequency of region queries can be decreased.
New lower bounds for Hopcroft's problem
  • Jeff Erickson
  • Computer Science, Mathematics
    Discret. Comput. Geom.
  • 1996
TLDR
A combinatorial representation of the relative order type of a set of points and hyperplanes, called amonochromatic cover, is defined, and lower bounds on its size in the worst case are derived, showing that the running time of any partitioning algorithm is bounded below by the size of some monochromatics cover.
On the relative complexities of some geometric problems
TLDR
This paper considers the relative complexities of a large number of computational geometry problems whose complexities are believed to be roughly (n4=3), and surveys known reductions among problems involving lines in three-space, and among higher dimensional closestpair problems.
A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise
TLDR
DBSCAN, a new clustering algorithm relying on a density-based notion of clusters which is designed to discover clusters of arbitrary shape, is presented which requires only one input parameter and supports the user in determining an appropriate value for it.
STING: A Statistical Information Grid Approach to Spatial Data Mining
TLDR
The idea is to capture statistical information associated with spatial cells in such a manner that whole classes of queries and clustering problems can be answered without recourse to the individual objects.
Range searching with efficient hierarchical cuttings
TLDR
It is shown that multilevel range searching data structures can be built with only a polylogarithmic overhead in space and query time per level (the previous solutions require at least a small fixed power of <italic>n</italic>.
OPTICS: ordering points to identify the clustering structure
TLDR
A new algorithm is introduced for the purpose of cluster analysis which does not produce a clustering of a data set explicitly; but instead creates an augmented ordering of the database representing its density-based clustering structure.
...
1
2
3
4
...