# A Unified View on Clustering Binary Data

@article{Li2005AUV, title={A Unified View on Clustering Binary Data}, author={Tao Li}, journal={Machine Learning}, year={2005}, volume={62}, pages={199-215} }

Clustering is the problem of identifying the distribution of patterns and intrinsic correlations in large data sets by partitioning the data points into similarity classes. This paper studies the problem of clustering binary data. Binary data have been occupying a special place in the domain of data analysis. A unified view of binary data clustering is presented by examining the connections among various clustering criteria. Experimental studies are conducted to empirically verify the…

## 45 Citations

A Comparison of Categorical Attribute Data Clustering Methods

- Computer ScienceS+SSPR
- 2014

A novel clustering algorithm is designed based on local search for this objective function and compared against six existing algorithms on well known data sets to provide better clustering quality than the other iterative methods at the cost of higher time complexity.

On multivariate binary data clustering and feature weighting

- Computer ScienceComput. Stat. Data Anal.
- 2010

A new density based clustering algorithm for Binary Data sets

- Computer Science2014 International Conference on High Performance Computing and Applications (ICHPCA)
- 2014

A density based clustering algorithm is proposed to effectively cluster binary datasets and it is observed that the proposed algorithm can effectively clusters both correlated and random binary datasets.

Clustering Binary Data with Bernoulli Mixture Models

- Computer Science
- 2014

This paper reviews the development and application of Bernoulli mixture models to clustering binary data and examines both Bayesian and non-Bayesian approaches to this model.

On combining multiple clusterings: an overview and a new perspective

- Computer ScienceApplied Intelligence
- 2009

This paper first summarizes different application scenarios of combining multiple clusterings and provides a new perspective of viewing the problem as a categorical clustering problem, then shows the connections between various consensus and clustering criteria and proposed new method to determine the final clustering.

Nearest Neighbor Median Shift Clustering for Binary Data

- Computer ScienceICANN
- 2021

The theory and practice behind a new modal clustering method for binary data based on the nearest neighbor median shift, which can discover accurately the location of clusters in binary data with theoretical and experimental analyses is described.

Mining Projected Clusters in High-Dimensional Spaces

- Computer ScienceIEEE Transactions on Knowledge and Data Engineering
- 2009

This work proposes a robust partitional distance-based projected clustering algorithm capable of detecting projected clusters of low dimensionality embedded in a high-dimensional space and avoids the computation of the distance in the full- dimensional space.

Mining of high dimensional data using enhanced clustering approach

- Computer Science
- 2018

The aimed paintings is successfully deliberate for projects clusters in excessive huge dimension space via adapting the stepped forward method in k Mediods set of pointers, and the main goal for second one gadget is to take away outliers, at the same time as the 1/3 method will find clusters in numerous spaces.

Comparison of Selected Methods for Document Clustering

- Computer ScienceAWIC
- 2011

Experiments with document clustering have proved that, from the point of view of entropy and purity, the direct method provides the best results and, as regards computing time, the repeated bisection (divisive) method has been the fastest.

Convex clustering for binary data

- Computer ScienceAdv. Data Anal. Classif.
- 2019

An efficient algorithm to solve the optimization by using majorization-minimization algorithm and alternative direction method of multipliers is provided, which confirmed its good performance and real data analysis demonstrates the practical usefulness of the proposed method.

## References

SHOWING 1-10 OF 53 REFERENCES

A general model for clustering binary data

- Computer ScienceKDD '05
- 2005

A general binary data clustering model is presented that treats the data and features equally, based on their symmetric association relations, and explicitly describes the data assignments as well as feature assignments.

Minimum entropy data partitioning

- Computer Science
- 1999

It is shown that minimisation of partition entropy can be used to estimate the number and structure of probable data generators and the resultant analyser may be regarded as a radial-basis function classifier.

Entropy-based criterion in categorical clustering

- Computer ScienceICML
- 2004

It is shown that the entropy-based criterion can be derived in the formal framework of probabilistic clustering models and the connection between the criterion and the approach based on dissimilarity co-efficients is established.

K-means clustering in a low-dimensional Euclidean space

- Computer Science
- 1994

A procedure is developed for clustering objects in a low-dimensional subspace of the column space of an objects by variables data matrix. The method is based on the K-means criterion and seeks the…

Criterion functions for document clustering

- Computer Science
- 2005

A class of clustering algorithms that treat the clustering problem as an optimization process which seeks to maximize or minimize a particular clustering criterion function defined over the entire clustering solution are focused on.

CACTUS—clustering categorical data using summaries

- Computer ScienceKDD '99
- 1999

This paper introduces a novel formalization of a cluster for categorical attributes by generalizing a definition of a clusters for numerical attributes and describes a very fast summarizationbased algorithm called CACTUS that discovers exactly such clusters in the data.

Clustering categorical data: an approach based on dynamical systems

- Computer ScienceThe VLDB Journal
- 2000

This work describes a novel approach for clustering collections of sets, and its application to the analysis and mining of categorical data, based on an iterative method for assigning and propagating weights on the categorical values in a table.

Probabilistic Aspects in Cluster Analysis

- Computer Science
- 1989

The historical evolution shows a surprising trend from an algorithmic, heuristic and applications oriented point of view to a more basic, theory oriented investigation of the structural, mathematical and statistical properties of clustering methods.

IFD: Iterative Feature and Data Clustering

- Computer ScienceSDM
- 2004

The convergence property of the clustering algorithm is shown, its connections with various existential approaches are discussed, and extensive experimental results on both synthetic and real data sets show the effectiveness of IFD algorithm.