ROCK: a robust clustering algorithm for categorical attributes

  title={ROCK: a robust clustering algorithm for categorical attributes},
  author={S. Guha and R. Rastogi and Kyuseok Shim},
  journal={Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337)},
  • S. Guha, R. Rastogi, Kyuseok Shim
  • Published 1999
  • Computer Science
  • Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337)
  • We study clustering algorithms for data with Boolean and categorical attributes. [...] Key Method We develop a robust hierarchical clustering algorithm, ROCK, that employs links and not distances when merging clusters. Our methods naturally extend to non-metric similarity measures that are relevant in situations where a domain expert/similarity table is the only source of knowledge. In addition to presenting detailed complexity results for ROCK, we also conduct an experimental study with real-life as well as…Expand Abstract
    1,526 Citations

    Figures, Tables, and Topics from this paper

    A robust and scalable clustering algorithm for mixed type attributes in large database environment
    • 585
    Limbo: A scalable algorithm to cluster categorical data
    • 9
    • Highly Influenced
    • PDF
    A novel attribute weighting algorithm for clustering high-dimensional categorical data
    • 72
    • PDF
    A Unified Metric for Categorical and Numerical Attributes in Data Clustering
    • 10
    • PDF
    A k-means type clustering algorithm for subspace clustering of mixed numeric and categorical datasets
    • 45
    • PDF
    An effective dissimilarity measure for clustering of high-dimensional categorical data
    • 4
    • Highly Influenced
    A New Clustering Algorithm for Categorical Attributes
    • 1
    • Highly Influenced
    • PDF
    MGR: An information theory based hierarchical divisive clustering algorithm for categorical data
    • 23
    • Highly Influenced
    • PDF


    A Clustering Algorithm for Categorical Attributes
    • 12
    Efficient and Effective Clustering Methods for Spatial Data Mining
    • 2,018
    • PDF
    High-dimensional similarity joins
    • 70
    • PDF
    A Database Interface for Clustering in Large Spatial Databases
    • 151
    • PDF
    Fast Similarity Search in the Presence of Noise, Scaling, and Translation in Time-Series Databases
    • 774
    • PDF
    Multilevel hypergraph partitioning: application in VLSI domain
    • 838
    • PDF
    Random sampling with a reservoir
    • 1,309
    • PDF