Corpus ID: 3007488

CLUSTERING LARGE DATA SETS WITH MIXED NUMERIC AND CATEGORICAL VALUES

@inproceedings{Huang1997CLUSTERINGLD,
  title={CLUSTERING LARGE DATA SETS WITH MIXED NUMERIC AND CATEGORICAL VALUES},
  author={Zhexue Huang},
  year={1997}
}
  • Zhexue Huang
  • Published 1997
  • Computer Science
  • Efficient partitioning of large data sets into homogenous clusters is a fundamental problem in data mining. [...] Key Method In the algorithm, objects are clustered against k prototypes. A method is developed to dynamically update the k prototypes in order to maximise the intra cluster similarity of objects. When applied to numeric data the algorithm is identical to the kmeans. To assist interpretation of clusters we use decision tree induction algorithms to create rules for clusters. These rules, together with…Expand Abstract
    441 Citations
    A Fast Clustering Algorithm to Cluster Very Large Categorical Data Sets in Data Mining
    • 525
    • Highly Influenced
    • PDF
    Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values
    • J. Huang
    • Computer Science
    • Data Mining and Knowledge Discovery
    • 2004
    • 1,937
    • Highly Influenced
    • PDF
    An iterative initial-points refinement algorithm for categorical data clustering
    • 76
    An alternative extension of the k-means algorithm for clustering categorical data
    • 169
    • Highly Influenced
    • PDF
    Clustering Algorithm for Incomplete Data Sets with Mixed Numeric and Categorical Attributes
    • 7
    • PDF

    References

    SHOWING 1-10 OF 25 REFERENCES
    Automated Construction of Classifications: Conceptual Clustering Versus Numerical Taxonomy
    • R. Michalski, R. Stepp
    • Computer Science, Medicine
    • IEEE Transactions on Pattern Analysis and Machine Intelligence
    • 1983
    • 362
    • PDF
    Cluster analysis
    • 8,052
    c-means clustering with the l/sub l/ and l/sub infinity / norms
    • 137
    • Highly Influential
    C4.5: Programs for Machine Learning
    • 20,781
    A non-greedy approach to tree-structured clustering
    • 11
    A deterministic annealing approach to clustering
    • 400
    Programs for Machine Learning
    • 5,695
    Classification and Regression Trees
    • 20,168
    • PDF