# A Fast Parallel Clustering Algorithm for Large Spatial Databases

@article{Xu2004AFP, title={A Fast Parallel Clustering Algorithm for Large Spatial Databases}, author={Xiaowei Xu and Jochen J{\"a}ger and Hans-Peter Kriegel}, journal={Data Mining and Knowledge Discovery}, year={2004}, volume={3}, pages={263-290} }

The clustering algorithm DBSCAN relies on a density-based notion of clusters and is designed to discover clusters of arbitrary shape as well as to distinguish noise. In this paper, we present PDBSCAN, a parallel version of this algorithm. We use the ‘shared-nothing’ architecture with multiple computers interconnected through a network. A fundamental component of a shared-nothing system is its distributed data structure. We introduce the dR*-tree, a distributed spatial index structure in which…

## 307 Citations

### A Parallel Algorithm for Fast Density-based Clustering in Large Spatial Databases

- Computer Science
- 2003

PFDC is presented, a parallel version of FDC, which has been implemented in C++ using MPI message passing to work on a variety of parallel platforms and shows good speedup results.

### A new scalable parallel DBSCAN algorithm using the disjoint-set data structure

- Computer Science2012 International Conference for High Performance Computing, Networking, Storage and Analysis
- 2012

This work employs the disjoint-set data structure to break the access sequentiality of DBSCAN and uses a tree-based bottom-up approach to construct the clusters, which yields a better-balanced workload distribution.

### PaX-DBSCAN: a proposed algorithm for improved clustering

- Computer Science
- 2016

A new parallel algorithm for DBSCAN and another algorithm to extend the X-tree spatial indexing structure is proposed and a full description of how the system can be archived is given.

### Parallel Processing for Density-based Spatial Clustering Algorithm using Complex Grid Partitioning and Its Performance Evaluation

- Computer Science
- 2016

A new parallelization model on a multi-core CPU using the spatial partition method for DBSCAN, which is one of the most fundamental algorithms for density-based spatial clustering, is proposed in order to improve the speedup performance of parallel processing.

### A Distributed Algorithm for Intrinsic Cluster Detection over Large Spatial Data

- Computer Science
- 2008

A Distributed Grid-based Density Clustering algorithm capable of identifying arbitrary shaped embedded clusters as well as multi-density clusters over large spatial datasets is presented.

### Exact, Fast and Scalable Parallel DBSCAN for Commodity Platforms

- Computer ScienceICDCN
- 2017

A grid-based DBSCAN algorithm, GridDBSCAN, is presented, which is significantly faster than the state-of-the-art sequential DBS CAN and its parallel implementations, and also proposes scalable parallel implementations of GridD BSCAN to leverage a multicore commodity cluster.

### HPDBSCAN: highly parallel DBSCAN

- Computer ScienceMLHPC@SC
- 2015

This paper employs three major techniques in order to break the sequentiality, empower workload-balancing as well as speed up neighborhood searches in distributed parallel processing environments i) a computation split heuristic for domain decomposition, ii) a data index preprocessing step and iii) a rule-based cluster merging scheme.

### A Distributed Shared Nearest Neighbors Clustering Algorithm

- Computer ScienceCIARP
- 2017

The Distributed Shared Nearest Neighbor based clustering algorithm (D-SNN) is introduced which is able to work with disjoint partitions of data producing a global clustering solution that achieves a competitive performance regarding centralized approaches.

### Fast tree-based algorithms for DBSCAN on GPUs

- Computer ScienceArXiv
- 2021

This paper proposes a new general framework for Dbscan on GPUs, and proposes two tree-based algorithms within that framework that fuse neighbor search with updating clustering information, and differ in their treatment of dense regions of the data.

## References

SHOWING 1-10 OF 33 REFERENCES

### A Database Interface for Clustering in Large Spatial Databases

- Computer ScienceKDD
- 1995

This paper presents an interface to the database management system (DBMS) based on a spatial access method, the R*-tree, which is crucial for the efficiency of KDD on large databases and proposes a method for spatial data sampling as part of the focusing component, significantly reducing the number of objects to be clustered.

### A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise

- Computer ScienceKDD
- 1996

DBSCAN, a new clustering algorithm relying on a density-based notion of clusters which is designed to discover clusters of arbitrary shape, is presented which requires only one input parameter and supports the user in determining an appropriate value for it.

### A distribution-based clustering algorithm for mining in large spatial databases

- Computer ScienceProceedings 14th International Conference on Data Engineering
- 1998

The new clustering algorithm DBCLASD (Distribution-Based Clustering of LArge Spatial Databases) is introduced to discover clusters of this type and is very attractive when considering its nonparametric nature and its good quality for clusters of arbitrary shape.

### Density-Based Clustering in Spatial Databases: The Algorithm GDBSCAN and Its Applications

- Computer ScienceData Mining and Knowledge Discovery
- 2004

The generalized algorithm DBSCAN can cluster point objects as well as spatially extended objects according to both, their spatial and their nonspatial attributes, and four applications using 2D points (astronomy, 3D points,biology, 5D points and 2D polygons) are presented, demonstrating the applicability of GDBSCAN to real-world problems.

### Efficiency of Hierarchic Agglomerative Clustering using the ICL Distributed array Processor

- Computer ScienceJ. Documentation
- 1989

An analysis of the cycle times of the two machines is presented which suggests that further, very substantial speed‐ups could be obtained from array processors of this type if they were to be based on more powerful processing elements.

### BIRCH: A New Data Clustering Algorithm and Its Applications

- Computer ScienceData Mining and Knowledge Discovery
- 2004

An efficient and scalable data clustering method is proposed, based on a new in-memory data structure called CF-tree, which serves as an in- memory summary of the data distribution, and implemented in a system called BIRCH (Balanced Iterative Reducing and Clustering using Hierarchies), and compared with other available methods.

### A fast distributed algorithm for mining association rules

- Computer ScienceFourth International Conference on Parallel and Distributed Information Systems
- 1996

An interesting distributed association rule mining algorithm, FDM (fast distributed mining of association rules), which generates a small number of candidate sets and substantially reduces the number of messages to be passed at mining association rules is proposed.