A Unified View on Clustering Binary Data

@article{Li2005AUV,
  title={A Unified View on Clustering Binary Data},
  author={Tao Li},
  journal={Machine Learning},
  year={2005},
  volume={62},
  pages={199-215}
}
  • Tao Li
  • Published 1 March 2006
  • Computer Science
  • Machine Learning
Clustering is the problem of identifying the distribution of patterns and intrinsic correlations in large data sets by partitioning the data points into similarity classes. This paper studies the problem of clustering binary data. Binary data have been occupying a special place in the domain of data analysis. A unified view of binary data clustering is presented by examining the connections among various clustering criteria. Experimental studies are conducted to empirically verify the… 
A Comparison of Categorical Attribute Data Clustering Methods
TLDR
A novel clustering algorithm is designed based on local search for this objective function and compared against six existing algorithms on well known data sets to provide better clustering quality than the other iterative methods at the cost of higher time complexity.
On multivariate binary data clustering and feature weighting
  • N. Bouguila
  • Computer Science
    Comput. Stat. Data Anal.
  • 2010
A new density based clustering algorithm for Binary Data sets
TLDR
A density based clustering algorithm is proposed to effectively cluster binary datasets and it is observed that the proposed algorithm can effectively clusters both correlated and random binary datasets.
Clustering Binary Data with Bernoulli Mixture Models
TLDR
This paper reviews the development and application of Bernoulli mixture models to clustering binary data and examines both Bayesian and non-Bayesian approaches to this model.
On combining multiple clusterings: an overview and a new perspective
TLDR
This paper first summarizes different application scenarios of combining multiple clusterings and provides a new perspective of viewing the problem as a categorical clustering problem, then shows the connections between various consensus and clustering criteria and proposed new method to determine the final clustering.
Nearest Neighbor Median Shift Clustering for Binary Data
TLDR
The theory and practice behind a new modal clustering method for binary data based on the nearest neighbor median shift, which can discover accurately the location of clusters in binary data with theoretical and experimental analyses is described.
Mining Projected Clusters in High-Dimensional Spaces
TLDR
This work proposes a robust partitional distance-based projected clustering algorithm capable of detecting projected clusters of low dimensionality embedded in a high-dimensional space and avoids the computation of the distance in the full- dimensional space.
Mining of high dimensional data using enhanced clustering approach
TLDR
The aimed paintings is successfully deliberate for projects clusters in excessive huge dimension space via adapting the stepped forward method in k Mediods set of pointers, and the main goal for second one gadget is to take away outliers, at the same time as the 1/3 method will find clusters in numerous spaces.
Comparison of Selected Methods for Document Clustering
TLDR
Experiments with document clustering have proved that, from the point of view of entropy and purity, the direct method provides the best results and, as regards computing time, the repeated bisection (divisive) method has been the fastest.
Convex clustering for binary data
TLDR
An efficient algorithm to solve the optimization by using majorization-minimization algorithm and alternative direction method of multipliers is provided, which confirmed its good performance and real data analysis demonstrates the practical usefulness of the proposed method.
...
...

References

SHOWING 1-10 OF 53 REFERENCES
A general model for clustering binary data
  • Tao Li
  • Computer Science
    KDD '05
  • 2005
TLDR
A general binary data clustering model is presented that treats the data and features equally, based on their symmetric association relations, and explicitly describes the data assignments as well as feature assignments.
Maximum certainty data partitioning
Minimum entropy data partitioning
TLDR
It is shown that minimisation of partition entropy can be used to estimate the number and structure of probable data generators and the resultant analyser may be regarded as a radial-basis function classifier.
Entropy-based criterion in categorical clustering
TLDR
It is shown that the entropy-based criterion can be derived in the formal framework of probabilistic clustering models and the connection between the criterion and the approach based on dissimilarity co-efficients is established.
K-means clustering in a low-dimensional Euclidean space
A procedure is developed for clustering objects in a low-dimensional subspace of the column space of an objects by variables data matrix. The method is based on the K-means criterion and seeks the
Criterion functions for document clustering
TLDR
A class of clustering algorithms that treat the clustering problem as an optimization process which seeks to maximize or minimize a particular clustering criterion function defined over the entire clustering solution are focused on.
CACTUS—clustering categorical data using summaries
TLDR
This paper introduces a novel formalization of a cluster for categorical attributes by generalizing a definition of a clusters for numerical attributes and describes a very fast summarizationbased algorithm called CACTUS that discovers exactly such clusters in the data.
Clustering categorical data: an approach based on dynamical systems
TLDR
This work describes a novel approach for clustering collections of sets, and its application to the analysis and mining of categorical data, based on an iterative method for assigning and propagating weights on the categorical values in a table.
Probabilistic Aspects in Cluster Analysis
TLDR
The historical evolution shows a surprising trend from an algorithmic, heuristic and applications oriented point of view to a more basic, theory oriented investigation of the structural, mathematical and statistical properties of clustering methods.
IFD: Iterative Feature and Data Clustering
TLDR
The convergence property of the clustering algorithm is shown, its connections with various existential approaches are discussed, and extensive experimental results on both synthetic and real data sets show the effectiveness of IFD algorithm.
...
...