Comparing partitions

  title={Comparing partitions},
  author={Lawrence J. Hubert and Phipps Arabie},
  journal={Journal of Classification},
The problem of comparing two different partitions of a finite set of objects reappears continually in the clustering literature. We begin by reviewing a well-known measure of partition correspondence often attributed to Rand (1971), discuss the issue of correcting this index for chance, and note that a recent normalization strategy developed by Morey and Agresti (1984) and adopted by others (e.g., Miligan and Cooper 1985) is based on an incorrect assumption. Then, the general problem of… Expand
Comparing clusterings---an information based distance
This paper proposes an information theoretic criterion for comparing two partitions, or clusterings, of the same data set. The criterion, called variation of information (VI), measures the amount ofExpand
Relational Generalizations of Cluster Validity Indices
This work generalizes three well-known validity indices: the modified Hubert's Gamma, Xie-Beni, and the generalized Dunn's indices, to relational data and develops a framework to convert many other validity indices to a relational form. Expand
Adjusting for Chance Clustering Comparison Measures
This paper solves the key technical challenge of analytically computing the expected value and variance of generalized IT measures and proposes guidelines for using ARI and AMI as external validation indices. Expand
Comparing Two Partitions of Non-Equal Sets of Units
Rand (1971) proposed what has since become a well-known index for comparing two partitions obtained on the same set of units. The index takes a value on the interval between 0 and 1, where a higherExpand
Understanding partition comparison indices based on counting object pairs
The overall indices based on the pair-counting approach are sensitive to cluster size imbalance and tend to reflect the degree of agreement on the large clusters and provide little to no information on smaller clusters. Expand
Comparing hard and overlapping clusterings
A corrected-for-chance measure (13AGRI) capable of comparing exclusive hard, fuzzy/probabilistic, non-exclusive hard, and possibilistic clusterings is developed and it is proved that 13AGRI and the adjusted Rand index (ARI) are equivalent in the exclusive hard domain. Expand
A modification of the k-means method for quasi-unsupervised learning
This paper builds upon a modification of the celebrated k-means method resorting to a similar alternating optimization procedure, endowed with additive partition weights controlling the size of the partitions formed, adjusted by means of the Levenberg-Marquardt algorithm, and proposes several further variations on this modification. Expand
An Extension of the Infinite Relational Model Incorporating Interaction between Objects
This paper proposes an extension of the IRM by introducing a subset mechanism that selects a part of the data according to the interaction among objects and presents posterior probabilities for running collapsed Gibbs sampling to learn the model from the given data. Expand
Extending the rand, adjusted rand and jaccard indices to fuzzy partitions
  • R. Brouwer
  • Computer Science
  • Journal of Intelligent Information Systems
  • 2008
This paper looks at some commonly used clustering measures including the rand index (RI), adjusted RI (ARI) and the jaccuard index(JI) that are already defined for crisp clustering and extends them to fuzzy clustering Measures giving FRI,FARI and FJI. Expand
Unsupervised extra trees: a stochastic approach to compute similarities in heterogeneous data
The empirical study shows that the approach based on UET outperforms existing methods in some cases and reduces the amount of preprocessing typically needed when dealing with real-world datasets. Expand