Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values
@article{Huang2004ExtensionsTT, title={Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values}, author={J. Huang}, journal={Data Mining and Knowledge Discovery}, year={2004}, volume={2}, pages={283-304} }
The k-means algorithm is well known for its efficiency in clustering large data sets. However, working only on numeric values prohibits it from being used to cluster real world data containing categorical values. In this paper we present two algorithms which extend the k-means algorithm to categorical domains and domains with mixed numeric and categorical values. The k-modes algorithm uses a simple matching dissimilarity measure to deal with categorical objects, replaces the means of clusters… CONTINUE READING
Figures, Tables, and Topics from this paper
1,964 Citations
An alternative extension of the k-means algorithm for clustering categorical data
- Mathematics
- 2004
- 169
- PDF
An iterative initial-points refinement algorithm for categorical data clustering
- Computer Science
- Pattern Recognit. Lett.
- 2002
- 77
Clustering Categorical Data Using the K-Means Algorithm and the Attribute’s Relative Frequency
- Computer Science
- 2017
- 5
A k-Means-Like Algorithm for Clustering Categorical Data Using an Information Theoretic-Based Dissimilarity Measure
- Computer Science
- FoIKS
- 2016
- 13
Clustering Algorithm for Incomplete Data Sets with Mixed Numeric and Categorical Attributes
- Computer Science
- 2013
- 7
- PDF
An improved k-prototypes clustering algorithm for mixed numeric and categorical data
- Mathematics, Computer Science
- Neurocomputing
- 2013
- 85
- PDF
A dissimilarity measure for the k-Modes clustering algorithm
- Computer Science
- Knowl. Based Syst.
- 2012
- 95
- PDF
References
SHOWING 1-10 OF 41 REFERENCES
A Fast Clustering Algorithm to Cluster Very Large Categorical Data Sets in Data Mining
- Computer Science
- DMKD
- 1997
- 534
- PDF
CLUSTERING LARGE DATA SETS WITH MIXED NUMERIC AND CATEGORICAL VALUES
- Computer Science
- 1997
- 451
- Highly Influential
- PDF
A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise
- Computer Science
- KDD
- 1996
- 15,128
- PDF
BIRCH: an efficient data clustering method for very large databases
- Computer Science
- SIGMOD '96
- 1996
- 4,581
- PDF
Symbolic clustering using a new dissimilarity measure
- Mathematics, Computer Science
- Pattern Recognit.
- 1991
- 316