On Clustering Binary Data

Abstract

Clustering is the problem of identifying the distribution of patterns and intrinsic correlations in large data sets by partitioning the data points into similarity classes. This paper studies the problem of clustering binary data. This is the case for market basket datasets where the transactions contain items and for document datasets where the documents contain “bag of words”. The contribution of the paper is two-fold. First a new clustering model is presented. The model treats the data and features equally, based on their symmetric association relations, and explicitly describes the data assignments as well as feature assignments. An iterative alternating leastsquares procedure is used for optimization. Second, a unified view of binary data clustering is presented by examining the connections among various clustering criteria.

DOI: 10.1137/1.9781611972757.54

Extracted Key Phrases

2 Figures and Tables

Cite this paper

@inproceedings{Li2005OnCB, title={On Clustering Binary Data}, author={Tao Li}, booktitle={SDM}, year={2005} }