# CLUSTERING LARGE DATA SETS WITH MIXED NUMERIC AND CATEGORICAL VALUES

@inproceedings{Huang1997CLUSTERINGLD, title={CLUSTERING LARGE DATA SETS WITH MIXED NUMERIC AND CATEGORICAL VALUES}, author={Zhexue Huang}, year={1997} }

Efficient partitioning of large data sets into homogenous clusters is a fundamental problem in data mining. [... ] Key Method In the algorithm, objects are clustered against k prototypes. A method is developed to dynamically update the k prototypes in order to maximise the intra cluster similarity of objects. When applied to numeric data the algorithm is identical to the kmeans. To assist interpretation of clusters we use decision tree induction algorithms to create rules for clusters. These rules, together with… Expand

## 513 Citations

A Fast Clustering Algorithm to Cluster Very Large Categorical Data Sets in Data Mining

- Computer ScienceDMKD
- 1997

This paper presents an algorithm, called k-modes, to extend the k-means paradigm to categorical domains, which introduces new dissimilarity measures to deal with categorical objects, replace means of clusters with modes, and use a frequency based method to update modes in the clustering process to minimise the clustered cost function.

Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values

- Computer ScienceData Mining and Knowledge Discovery
- 2004

Two algorithms which extend the k-means algorithm to categorical domains and domains with mixed numeric and categorical values are presented and are shown to be efficient when clustering large data sets, which is critical to data mining applications.

Design and analysis of clustering algorithms for numerical, categorical and mixed data

- Computer Science
- 2010

The purpose of this research is to design and analyse clustering algorithms for numerical, categorical and mixed data sets, and a main part of this thesis is devoted to normalisation.

An iterative initial-points refinement algorithm for categorical data clustering

- Computer SciencePattern Recognit. Lett.
- 2002

A New Clustering Algorithm of Hybrid Data According to Weights of Attributes

- Computer Science
- 2016

This paper introduces an algorithm which has been improved for the clustering of large hybrid data in an effective way that also includes the weights of attributes, mainly based on the K-Prototypes algorithm.

An improved k-prototypes clustering algorithm for mixed numeric and categorical data

- Computer ScienceNeurocomputing
- 2013

Modified K-Means Algorithm for Effective Clustering of Categorical Data Sets

- Computer Science
- 2014

Traditional k-means algorithm is well known for its clustering ability and efficiency on large amount of data sets. But this method is well suited for numeric values only and cannot be effectively…

An alternative extension of the k-means algorithm for clustering categorical data

- Computer Science
- 2004

This paper shows how to apply the notion of “cluster centers” on a dataset of categorical objects and how to use this notion for formulating the clustering problem of categorically objects as a partitioning problem.

Clustering Algorithm for Incomplete Data Sets with Mixed Numeric and Categorical Attributes

- Computer Science
- 2013

An improved k-prototypes algorithm is proposed in this paper, which employs a new dissimilarity measure for incomplete data set with mixed numeric and categorical attributes and a new approach to select k objects as the initial prototypes based on the nearest neighbors.

Integrated Framework Using Frequent Pattern for Clustering Numeric and Nominal Data Sets

- Computer Science
- 2016

An integrated framework using frequent pattern analysis, frequent pattern-based framework for mixed data clustering (FPMC) algorithm, to cluster mixed data in a competent way by performing a one-time clustering along with attribute reduction is proposed.

## References

SHOWING 1-10 OF 23 REFERENCES

Automated Construction of Classifications: Conceptual Clustering Versus Numerical Taxonomy

- Computer ScienceIEEE Transactions on Pattern Analysis and Machine Intelligence
- 1983

A method for automated construction of classifications called conceptual clustering is described and compared to methods used in numerical taxonomy, in which descriptive concepts are conjunctive statements involving relations on selected object attributes and optimized according to an assumed global criterion of clustering quality.

Some methods for classification and analysis of multivariate observations

- Mathematics
- 1967

The main purpose of this paper is to describe a process for partitioning an N-dimensional population into k sets on the basis of a sample. The process, which is called 'k-means,' appears to give…

c-means clustering with the l/sub l/ and l/sub infinity / norms

- Computer Science
- 1991

This method broadens the applications horizon of the FCM family by enabling users to match discontinuous multidimensional numerical data structures with similarity measures that have nonhyperelliptical topologies.

C4.5: Programs for Machine Learning

- Computer Science
- 1992

A complete guide to the C4.5 system as implemented in C for the UNIX environment, which starts from simple core learning methods and shows how they can be elaborated and extended to deal with typical problems such as missing data and over hitting.

Programs for Machine Learning

- Computer Science
- 1994

In his new book, C4.5: Programs for Machine Learning, Quinlan has put together a definitive, much needed description of his complete system, including the latest developments, which will be a welcome addition to the library of many researchers and students.

A General Coefficient of Similarity and Some of Its Properties

- Computer Science
- 1971

A general coefficient measuring the similarity between two sampling units is defined. The matrix of similarities between all pairs of sample units is shown to be positive semidefinite (except…

Discrimination and Classification

- Computer Science
- 1981

Presents different approaches to discrimination and classification problems from a statistical perspective. Provides computer projects concentrating on the most widely used and important algorithms,…

Genetic Algorithms in Search Optimization and Machine Learning

- Computer Science
- 1988

This book brings together the computer techniques, mathematical tools, and research results that will enable both students and practitioners to apply genetic algorithms to problems in many fields.