• Corpus ID: 6103609

Optimal Grid-Clustering: Towards Breaking the Curse of Dimensionality in High-Dimensional Clustering

  title={Optimal Grid-Clustering: Towards Breaking the Curse of Dimensionality in High-Dimensional Clustering},
  author={Alexander Hinneburg and Daniel A. Keim},
Many applications require the clustering of large amounts of high-dimensional data. Most clustering algorithms, however, do not work e ectively and e ciently in highdimensional space, which is due to the so-called "curse of dimensionality". In addition, the high-dimensional data often contains a signi cant amount of noise which causes additional e ectiveness problems. In this paper, we review and compare the existing algorithms for clustering highdimensional data and show the impact of the… 

Mining for High Dimensional Clusters using Projections and Visualizations

Many applications require the clustering of large amounts of high dimensional data. Most automated clustering techniques, however, do not work effectively and/or efficiently on high dimensional

Clustering High-Dimensional Data with Low-Order Neighbors

This paper presents a new approach that takes objects (or points) as the atomic units, so that the restriction of cell size can be relaxed without degrading the resolution of clustering results.

Subspace Clustering of High Dimensional Spatial Data with Noises

This paper presents a new subspace clustering method, called SCI (Subspace Clustering based on Information), which combines Shannon information with grid-based and density-based clustering techniques to solve the problem of large amount of high dimensional spatial data sets with noises.

Redefining Clustering for High-Dimensional Applications

This work introduces a very general concept of projected clustering which is able to construct clusters in arbitrarily aligned subspaces of lower dimensionality, and provides a new concept of using extended cluster feature vectors in order to make the algorithm scalable for very large databases.

Effective Subspace Clustering with Dimension Pairing in the Presence of High Levels of Noise

A new method, MAXCLUS, is proposed, which first identifies sub-spaces where clusters could be located then pinpoints the clusters in each sub-space, which is very efficient and accurate with little or no parameter adjustment on a wide range of problems outperforming existing approaches.

Clustering high dimensional data

  • I. Assent
  • Computer Science
    WIREs Data Mining Knowl. Discov.
  • 2012
An overview of the effects of high‐dimensional spaces, and their implications for different clustering paradigms is provided, with pointers to the literature, and open research issues remain.

M-Denclue for Effective Data Clustering in High Dimensional Non-Linear Data

This research work concentrates on devising an enhanced algorithm for clustering high dimensional non-linear data to overcome the "curse of dimensionality".

Breaking Down Dimensionality : Effective and Efficient Feature Selection for High-Dimensional Clustering

A novel feature selection method for clustering (or, in general, exploring) high-dimensional data based on a generic measure of cluster tendency between dimensions that is very robust for correct identification of subspaces of various dimensionalities that contain clusters of various sizes.

PCS: An Efficient Clustering Method for High-Dimensional Data

A set-theoretic clustering method called PCS (Pairwise Consensus Scheme) for high-dimensional data, which constructs a near-optimal consensus clustering from these projected clusterings to be the final clustering of the original data set.

O-Cluster: scalable clustering of large high dimensional data sets

  • B. MilenovaM. Campos
  • Computer Science
    2002 IEEE International Conference on Data Mining, 2002. Proceedings.
  • 2002
This work proposes a new clustering algorithm called O-Cluster, which combines a novel active sampling technique with an axis-parallel partitioning strategy to identify continuous areas of high density in the input space and demonstrates its excellent scalability.



BIRCH: an efficient data clustering method for very large databases

A data clustering method named BIRCH (Balanced Iterative Reducing and Clustering using Hierarchies) is presented, and it is demonstrated that it is especially suitable for very large databases.

An Efficient Approach to Clustering in Large Multimedia Databases with Noise

A new algorithm to clustering in large multimedia databases called DENCLUE (DENsity-based CLUstEring) is introduced, which has a firm mathematical basis, has good clustering properties in data sets with large amounts of noise, allows a compact mathematical description of arbitrarily shaped clusters in high-dimensional data sets and is significantly faster than existing algorithms.

Automatic subspace clustering of high dimensional data for data mining applications

CLIQUE is presented, a clustering algorithm that satisfies each of these requirements of data mining applications including the ability to find clusters embedded in subspaces of high dimensional data, scalability, end-user comprehensibility of the results, non-presumption of any canonical data distribution, and insensitivity to the order of input records.

WaveCluster: A Multi-Resolution Clustering Approach for Very Large Spatial Databases

WaveCluster is proposed, a novel clustering approach based on wavelet transforms which can effectively identify arbitrary shape clusters at different degrees of accuracy and is highly efficient in terms of time complexity.

A distribution-based clustering algorithm for mining in large spatial databases

The new clustering algorithm DBCLASD (Distribution-Based Clustering of LArge Spatial Databases) is introduced to discover clusters of this type and is very attractive when considering its nonparametric nature and its good quality for clusters of arbitrary shape.

A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise

DBSCAN, a new clustering algorithm relying on a density-based notion of clusters which is designed to discover clusters of arbitrary shape, is presented which requires only one input parameter and supports the user in determining an appropriate value for it.

STING: A Statistical Information Grid Approach to Spatial Data Mining

The idea is to capture statistical information associated with spatial cells in such a manner that whole classes of queries and clustering problems can be answered without recourse to the individual objects.

Grid-clustering: an efficient hierarchical clustering method for very large data sets

  • E. Schikuta
  • Computer Science
    Proceedings of 13th International Conference on Pattern Recognition
  • 1996
A new approach to hierarchical clustering of very large data sets is presented with the GRIDCLUS algorithm, which uses a multidimensional grid data structure to organize the value space surrounding the pattern values, rather than to organized the patterns themselves.

When Is ''Nearest Neighbor'' Meaningful?

The effect of dimensionality on the "nearest neighbor" problem is explored, and it is shown that under a broad set of conditions, as dimensionality increases, the Distance to the nearest data point approaches the distance to the farthest data point.

Efficient and Effective Clustering Methods for Spatial Data Mining

The analysis and experiments show that with the assistance of CLAHANS, these two algorithms are very effective and can lead to discoveries that are difficult to find with current spatial data mining algorithms.