ROCK: a robust clustering algorithm for categorical attributes
@article{Guha1999ROCKAR, title={ROCK: a robust clustering algorithm for categorical attributes}, author={S. Guha and R. Rastogi and Kyuseok Shim}, journal={Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337)}, year={1999}, pages={512-521} }
We study clustering algorithms for data with Boolean and categorical attributes. [...] Key Method We develop a robust hierarchical clustering algorithm, ROCK, that employs links and not distances when merging clusters. Our methods naturally extend to non-metric similarity measures that are relevant in situations where a domain expert/similarity table is the only source of knowledge. In addition to presenting detailed complexity results for ROCK, we also conduct an experimental study with real-life as well as…Expand Abstract
1,526 Citations
A robust and scalable clustering algorithm for mixed type attributes in large database environment
- Computer Science
- KDD '01
- 2001
- 585
A novel attribute weighting algorithm for clustering high-dimensional categorical data
- Mathematics, Computer Science
- Pattern Recognit.
- 2011
- 72
- PDF
A Unified Metric for Categorical and Numerical Attributes in Data Clustering
- Computer Science
- PAKDD
- 2013
- 10
- PDF
CLUC: a natural clustering algorithm for categorical datasets based on cohesion
- Computer Science
- SAC '06
- 2006
- 5
Incremental Algorithm to Cluster the Categorical Data with Frequency Based Similarity Measure
- Computer Science
- 2010
- 5
A k-means type clustering algorithm for subspace clustering of mixed numeric and categorical datasets
- Mathematics, Computer Science
- Pattern Recognit. Lett.
- 2011
- 45
- PDF
An effective dissimilarity measure for clustering of high-dimensional categorical data
- Mathematics, Computer Science
- Knowledge and Information Systems
- 2012
- 4
- Highly Influenced
MGR: An information theory based hierarchical divisive clustering algorithm for categorical data
- Computer Science
- Knowl. Based Syst.
- 2014
- 23
- Highly Influenced
- PDF
References
SHOWING 1-10 OF 15 REFERENCES
A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise
- Computer Science
- KDD
- 1996
- 15,123
- PDF
BIRCH: an efficient data clustering method for very large databases
- Computer Science
- SIGMOD '96
- 1996
- 4,581
- PDF
High-dimensional similarity joins
- Computer Science
- Proceedings 13th International Conference on Data Engineering
- 1997
- 70
- PDF
Fast Similarity Search in the Presence of Noise, Scaling, and Translation in Time-Series Databases
- Computer Science
- VLDB
- 1995
- 774
- PDF