OPTICS: ordering points to identify the clustering structure

  title={OPTICS: ordering points to identify the clustering structure},
  author={Mihael Ankerst and Markus M. Breunig and Hans-Peter Kriegel and J{\"o}rg Sander},
  booktitle={ACM SIGMOD Conference},
Cluster analysis is a primary method for database mining. It is either used as a stand-alone tool to get insight into the distribution of a data set, e.g. to focus further analysis and data processing, or as a preprocessing step for other algorithms operating on the detected clusters. Almost all of the well-known clustering algorithms require input parameters which are hard to determine but have a significant influence on the clustering result. Furthermore, for many real-data sets there does… 

A Cluster Algorithm Identifying the Clustering Structure

  • Zhi-Wei Sun
  • Computer Science
    2008 International Conference on Computer Science and Software Engineering
  • 2008
Both theory analysis and experimental results confirm CluICS can cluster data of varying density with automatic setting different parameters in different partitions and its efficiency is much higher than DBSCAN algorithm.

An efficient density-based clustering for multi-dimensional database

A density-based clustering algorithm that adopts a divide-and-conquer strategy and presents a way to automatically determine the grid cell width, showing that the proposed algorithm efficiently finds accurate clusters in both low-dimensional and multi-dimensional databases.

Adaptive Methods for Determining DBSCAN Parameters

The objective is to enhance the existing DBSCAN algorithm by automatically selecting the input parameters and to find the density varied clusters, and the proposed algorithm discovers arbitrary shaped clusters, requires noinput parameters and uses the same definitions of DBS CAN algorithm.

ICA: An Incremental Clustering Algorithm Based on OPTICS

A detailed comparison of ICA and OPTICS is presented and the results illustrate that ICA is much more suitable for clustering the dynamic datasets, i.e., some new data objects are added into the datasets as time goes on.

ICA: An Incremental Clustering Algorithm Based on OPTICS

A detailed comparison of ICA and OPTICS is presented and the results illustrate that ICA is much more suitable for clustering the dynamic datasets, i.e., some new data objects are added into the datasets as time goes on.

Improving OPTICS Algorithm with Imperialist Competitive Algorithm: Choosing Automatically Best Parameters

The main goal of this research is to use meta-heuristic methods especially Imperialist Competitive Algorithm (ICA) to precise estimation of these parameters (Ɛ, µ) so that they can apply to OPTICS Algorithm to achieve accurate and high quality clusters for any data sets.

AutoEpsDBSCAN : DBSCAN with Eps Automatic for Large Dataset

The proposed enhanced algorithm can detect the clusters of varied density with different shapes and sizes from large amount of data which contains noise and outliers, requires only one input parameters and gives better output then the existing DBSCAN algorithm.

An Empirical Evaluation of Density-Based Clustering Techniques

This paper shows the comparison of two density based clustering methods i.e. DBSCAN (15) & OPTICS (14) based on essential parameters such as distance type, noise ratio as well as run time of simulations performed aswell as number of clusters formed needed for a good clustering algorithm.


This paper introduces a new concept which is used to develop a new recursive version of DBSCAN that can successfully perform hierarchical clustering, called Level-Set Clustering (LSC), and is able to produce the clustering result with the same O(n log n) time complexity.



CURE: an efficient clustering algorithm for large databases

This work proposes a new clustering algorithm called CURE that is more robust to outliers, and identifies clusters having non-spherical shapes and wide variances in size, and demonstrates that random sampling and partitioning enable CURE to not only outperform existing algorithms but also to scale well for large databases without sacrificing clustering quality.

A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise

DBSCAN, a new clustering algorithm relying on a density-based notion of clusters which is designed to discover clusters of arbitrary shape, is presented which requires only one input parameter and supports the user in determining an appropriate value for it.

BIRCH: an efficient data clustering method for very large databases

A data clustering method named BIRCH (Balanced Iterative Reducing and Clustering using Hierarchies) is presented, and it is demonstrated that it is especially suitable for very large databases.

A Fast Clustering Algorithm to Cluster Very Large Categorical Data Sets in Data Mining

This paper presents an algorithm, called k-modes, to extend the k-means paradigm to categorical domains, which introduces new dissimilarity measures to deal with categorical objects, replace means of clusters with modes, and use a frequency based method to update modes in the clustering process to minimise the clustered cost function.

WaveCluster: A Multi-Resolution Clustering Approach for Very Large Spatial Databases

WaveCluster is proposed, a novel clustering approach based on wavelet transforms which can effectively identify arbitrary shape clusters at different degrees of accuracy and is highly efficient in terms of time complexity.

An Efficient Approach to Clustering in Large Multimedia Databases with Noise

A new algorithm to clustering in large multimedia databases called DENCLUE (DENsity-based CLUstEring) is introduced, which has a firm mathematical basis, has good clustering properties in data sets with large amounts of noise, allows a compact mathematical description of arbitrarily shaped clusters in high-dimensional data sets and is significantly faster than existing algorithms.

Grid-clustering: an efficient hierarchical clustering method for very large data sets

  • E. Schikuta
  • Computer Science
    Proceedings of 13th International Conference on Pattern Recognition
  • 1996
A new approach to hierarchical clustering of very large data sets is presented with the GRIDCLUS algorithm, which uses a multidimensional grid data structure to organize the value space surrounding the pattern values, rather than to organized the patterns themselves.

Incremental Clustering for Mining in a Data Warehousing Environment

It can be proven that the incremental algorithm yields the same result as DBSCAN, which is applicable to any database containing data from a metric space, e.g., to a spatial database or to a WWW-log database.

Automatic subspace clustering of high dimensional data for data mining applications

CLIQUE is presented, a clustering algorithm that satisfies each of these requirements of data mining applications including the ability to find clusters embedded in subspaces of high dimensional data, scalability, end-user comprehensibility of the results, non-presumption of any canonical data distribution, and insensitivity to the order of input records.

The BANG-Clustering System: Grid-Based Data Analysis

The BANG-Clustering system presented in this paper is a novel approach to hierarchical data analysis and uses a multidimensional grid data structure to organize the value space surrounding the pattern values.