A Fast Parallel Clustering Algorithm for Large Spatial Databases

  title={A Fast Parallel Clustering Algorithm for Large Spatial Databases},
  author={Xiaowei Xu and Jochen J{\"a}ger and Hans-Peter Kriegel},
  journal={Data Mining and Knowledge Discovery},
The clustering algorithm DBSCAN relies on a density-based notion of clusters and is designed to discover clusters of arbitrary shape as well as to distinguish noise. In this paper, we present PDBSCAN, a parallel version of this algorithm. We use the ‘shared-nothing’ architecture with multiple computers interconnected through a network. A fundamental component of a shared-nothing system is its distributed data structure. We introduce the dR*-tree, a distributed spatial index structure in which… 

A Parallel Algorithm for Fast Density-based Clustering in Large Spatial Databases

PFDC is presented, a parallel version of FDC, which has been implemented in C++ using MPI message passing to work on a variety of parallel platforms and shows good speedup results.

A new scalable parallel DBSCAN algorithm using the disjoint-set data structure

This work employs the disjoint-set data structure to break the access sequentiality of DBSCAN and uses a tree-based bottom-up approach to construct the clusters, which yields a better-balanced workload distribution.

PaX-DBSCAN: a proposed algorithm for improved clustering

A new parallel algorithm for DBSCAN and another algorithm to extend the X-tree spatial indexing structure is proposed and a full description of how the system can be archived is given.

Parallel Processing for Density-based Spatial Clustering Algorithm using Complex Grid Partitioning and Its Performance Evaluation

A new parallelization model on a multi-core CPU using the spatial partition method for DBSCAN, which is one of the most fundamental algorithms for density-based spatial clustering, is proposed in order to improve the speedup performance of parallel processing.

A Distributed Algorithm for Intrinsic Cluster Detection over Large Spatial Data

A Distributed Grid-based Density Clustering algorithm capable of identifying arbitrary shaped embedded clusters as well as multi-density clusters over large spatial datasets is presented.

Exact, Fast and Scalable Parallel DBSCAN for Commodity Platforms

A grid-based DBSCAN algorithm, GridDBSCAN, is presented, which is significantly faster than the state-of-the-art sequential DBS CAN and its parallel implementations, and also proposes scalable parallel implementations of GridD BSCAN to leverage a multicore commodity cluster.

HPDBSCAN: highly parallel DBSCAN

This paper employs three major techniques in order to break the sequentiality, empower workload-balancing as well as speed up neighborhood searches in distributed parallel processing environments i) a computation split heuristic for domain decomposition, ii) a data index preprocessing step and iii) a rule-based cluster merging scheme.

On distributing the clustering process

A Distributed Shared Nearest Neighbors Clustering Algorithm

The Distributed Shared Nearest Neighbor based clustering algorithm (D-SNN) is introduced which is able to work with disjoint partitions of data producing a global clustering solution that achieves a competitive performance regarding centralized approaches.

Fast tree-based algorithms for DBSCAN on GPUs

This paper proposes a new general framework for Dbscan on GPUs, and proposes two tree-based algorithms within that framework that fuse neighbor search with updating clustering information, and differ in their treatment of dense regions of the data.



A Database Interface for Clustering in Large Spatial Databases

This paper presents an interface to the database management system (DBMS) based on a spatial access method, the R*-tree, which is crucial for the efficiency of KDD on large databases and proposes a method for spatial data sampling as part of the focusing component, significantly reducing the number of objects to be clustered.

A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise

DBSCAN, a new clustering algorithm relying on a density-based notion of clusters which is designed to discover clusters of arbitrary shape, is presented which requires only one input parameter and supports the user in determining an appropriate value for it.

A distribution-based clustering algorithm for mining in large spatial databases

The new clustering algorithm DBCLASD (Distribution-Based Clustering of LArge Spatial Databases) is introduced to discover clusters of this type and is very attractive when considering its nonparametric nature and its good quality for clusters of arbitrary shape.

Density-Based Clustering in Spatial Databases: The Algorithm GDBSCAN and Its Applications

The generalized algorithm DBSCAN can cluster point objects as well as spatially extended objects according to both, their spatial and their nonspatial attributes, and four applications using 2D points (astronomy, 3D points,biology, 5D points and 2D polygons) are presented, demonstrating the applicability of GDBSCAN to real-world problems.

Parallel clustering algorithms

Efficiency of Hierarchic Agglomerative Clustering using the ICL Distributed array Processor

An analysis of the cycle times of the two machines is presented which suggests that further, very substantial speed‐ups could be obtained from array processors of this type if they were to be based on more powerful processing elements.

BIRCH: A New Data Clustering Algorithm and Its Applications

An efficient and scalable data clustering method is proposed, based on a new in-memory data structure called CF-tree, which serves as an in- memory summary of the data distribution, and implemented in a system called BIRCH (Balanced Iterative Reducing and Clustering using Hierarchies), and compared with other available methods.

Parallel Algorithms for Hierarchical Clustering

  • C. Olson
  • Computer Science
    Parallel Comput.
  • 1995

A fast distributed algorithm for mining association rules

An interesting distributed association rule mining algorithm, FDM (fast distributed mining of association rules), which generates a small number of candidate sets and substantially reduces the number of messages to be passed at mining association rules is proposed.