Automatic Subspace Clustering of High Dimensional Data

@article{Agrawal2005AutomaticSC,
  title={Automatic Subspace Clustering of High Dimensional Data},
  author={Rakesh Agrawal and Johannes Gehrke and Dimitrios Gunopulos and Prabhakar Raghavan},
  journal={Data Mining and Knowledge Discovery},
  year={2005},
  volume={11},
  pages={5-33}
}
Data mining applications place special requirements on clustering algorithms including: the ability to find clusters embedded in subspaces of high dimensional data, scalability, end-user comprehensibility of the results, non-presumption of any canonical data distribution, and insensitivity to the order of input records. We present CLIQUE, a clustering algorithm that satisfies each of these requirements. CLIQUE identifies dense clusters in subspaces of maximum dimensionality. It generates… Expand
Locally adaptive metrics for clustering high dimensional data
TLDR
An algorithm is introduced that discovers clusters in subspaces spanned by different combinations of dimensions via local weightings of features, whose values capture the relevance of features within the corresponding cluster. Expand
Evaluation of Subspace Clustering of High Dimensional Data
Data Clustering is an unsupervised method for extracting hidden pattern from huge datasets. Due to the sparsity of data points, conventional clustering algorithms do not scale well to cluster highExpand
A weighting k-modes algorithm for subspace clustering of categorical data
TLDR
In the proposed algorithm, an additional step is added to the k-modes clustering process to automatically compute the weight of all dimensions in each cluster by using complement entropy. Expand
Mining Projected Clusters in High-Dimensional Spaces
TLDR
This work proposes a robust partitional distance-based projected clustering algorithm capable of detecting projected clusters of low dimensionality embedded in a high-dimensional space and avoids the computation of the distance in the full- dimensional space. Expand
Subspace clustering for complex data
TLDR
This work introduces novel methods for effective subspace clustering on various types of complex data: vector data, imperfect data, and graph data and proposes models whose solutions contain only non-redundant and, thus, valuable clusters. Expand
A Comprehensive Study of Challenges and Approaches for Clustering High Dimensional Data
TLDR
This paper provides a short introduction to various approaches and challenges for high-dimensional data clustering. Expand
Mining Significant Subspaces
As both the number of dimensions increases, existing clustering methods in full feature space are not appropriate to cluster data in databases. Thus, the subspace clustering has attracted more andExpand
Mining of high dimensional data using enhanced clustering approach
TLDR
The aimed paintings is successfully deliberate for projects clusters in excessive huge dimension space via adapting the stepped forward method in k Mediods set of pointers, and the main goal for second one gadget is to take away outliers, at the same time as the 1/3 method will find clusters in numerous spaces. Expand
High-Dimensional Clustering Method for High Performance Data Mining
TLDR
A new high-dimensional clustering method for the high performance data mining that provides efficient cell creation and cell insertion algorithms using a space-partitioning technique, as well as makes use of a filtering-based index structure using an approximation technique. Expand
Subspace clustering through attribute clustering
Many recently proposed subspace clustering methods suffer from two severe problems. First, the algorithms typically scale exponentially with the data dimensionality or the subspace dimensionality ofExpand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 63 REFERENCES
Automatic subspace clustering of high dimensional data for data mining applications
TLDR
CLIQUE is presented, a clustering algorithm that satisfies each of these requirements of data mining applications including the ability to find clusters embedded in subspaces of high dimensional data, scalability, end-user comprehensibility of the results, non-presumption of any canonical data distribution, and insensitivity to the order of input records. Expand
Finding generalized projected clusters in high dimensional spaces
High dimensional data has always been a challenge for clustering algorithms because of the inherent sparsity of the points. Recent research results indicate that in high dimensional data, even theExpand
Subspace Clustering of High Dimensional Data
TLDR
The results show the feasibility of the proposed technique to perform simultaneous clustering of genes and conditions in microarray data, and experimentally demonstrate the gain in perfomance the method achieves. Expand
Fast algorithms for projected clustering
TLDR
An algorithmic framework for solving the projected clustering problem, in which the subsets of dimensions selected are specific to the clusters themselves, is developed and tested. Expand
BIRCH: an efficient data clustering method for very large databases
TLDR
A data clustering method named BIRCH (Balanced Iterative Reducing and Clustering using Hierarchies) is presented, and it is demonstrated that it is especially suitable for very large databases. Expand
CURE: an efficient clustering algorithm for large databases
TLDR
This work proposes a new clustering algorithm called CURE that is more robust to outliers, and identifies clusters having non-spherical shapes and wide variances in size, and demonstrates that random sampling and partitioning enable CURE to not only outperform existing algorithms but also to scale well for large databases without sacrificing clustering quality. Expand
A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise
TLDR
DBSCAN, a new clustering algorithm relying on a density-based notion of clusters which is designed to discover clusters of arbitrary shape, is presented which requires only one input parameter and supports the user in determining an appropriate value for it. Expand
SLIQ: A Fast Scalable Classifier for Data Mining
TLDR
Issues in building a scalable classifier are discussed and the design of SLIQ, a new classifier that uses a novel pre-sorting technique in the tree-growth phase to enable classification of disk-resident datasets is presented. Expand
A Numerical Classification Method for Partitioning of a Large Multidimensional Mixed Data Set
Sometimes the problem of data classification arises without any knowledge of the underlying classes. Furthermore, not enough sample data from the classes may be available to develop reliableExpand
Efficient and Effective Clustering Methods for Spatial Data Mining
TLDR
The analysis and experiments show that with the assistance of CLAHANS, these two algorithms are very effective and can lead to discoveries that are difficult to find with current spatial data mining algorithms. Expand
...
1
2
3
4
5
...