Characterization and evaluation of similarity measures for pairs of clusterings

@article{Pfitzner2008CharacterizationAE,
  title={Characterization and evaluation of similarity measures for pairs of clusterings},
  author={Darius Pfitzner and Richard Leibbrandt and David M. W. Powers},
  journal={Knowledge and Information Systems},
  year={2008},
  volume={19},
  pages={361-394}
}
In evaluating the results of cluster analysis, it is common practice to make use of a number of fixed heuristics rather than to compare a data clustering directly against an empirically derived standard, such as a clustering empirically obtained from human informants. Given the dearth of research into techniques to express the similarity between clusterings, there is broad scope for fundamental research in this area. In defining the comparative problem, we identify two types of worst-case… Expand
The Impact of Random Models on Clustering Similarity
TLDR
It is demonstrated that the choice of random model can have a drastic impact on the ranking of similar clustering pairs, and the evaluation of a clustering method with respect to a random baseline; thus, the choices of random clustering model should be carefully justified. Expand
On comparing clusterings: an element-centric framework unifies overlaps and hierarchy
TLDR
The strengths of the proposed element-centric framework are illustrated by revealing new insights into the organization of clusters in two applications: the improved classification of schizophrenia based on the overlapping and hierarchical community struc- ture of fMRI brain networks, and the disentanglement of various social homophily factors in Facebook social networks. Expand
Element-centric clustering comparison unifies overlaps and hierarchy
TLDR
This work unify the comparison of disjoint, overlapping, and hierarchically structured clusterings by proposing a new element-centric framework: elements are compared based on the relationships induced by the cluster structure, as opposed to the traditional cluster-centric philosophy. Expand
Towards a Classification of Binary Similarity Measures
TLDR
The paper proposes the method of comparative analysis of similarity measures based on the set theoretic representation of these measures and comparison of algebraic properties of these representations and shows existing relationship between results of clustering and the classification of measures by their properties. Expand
Characterizing and Comparing External Measures for the Assessment of Cluster Analysis and Community Detection
TLDR
This work proposes a new empirical evaluation framework that is not tied to any specific measure or application, so it can be applied to any situation and is illustrated by applying it to a selection of standard measures, and can be put in practice through two concrete use cases. Expand
On Using Class-Labels in Evaluation of Clusterings
Although clustering has been studied for several decades, the fundamental problem of a valid evaluation has not yet been solved. The sound evaluation of clustering results in particular on real dataExpand
Understanding information theoretic measures for comparing clusterings
TLDR
It is shown that a class of normalizations of the mutual information can be decomposed into indices that contain information on the level of individual clusters that reveal that overall measures can be interpreted as summary statistics of information reflected in the individual clusters. Expand
A review of conceptual clustering algorithms
TLDR
This work presents an overview of the most influential algorithms reported in the field of conceptual clustering, highlighting their limitations or drawbacks, and presents a taxonomy of these methods as well as a qualitative comparison of these algorithms. Expand
An Improved Similarity-Based Clustering Algorithm for Multi-Database Mining
TLDR
This work uses coordinate descent and back-propagation to search for the optimal clustering of the n multiple database in a way that minimizes a convex clustering quality measure L(θ) in less than (n2−n)/2 iterations. Expand
A Critical Note on the Evaluation of Clustering Algorithms
TLDR
It is suggested that the applicability of existing benchmark datasets should be carefully revisited and significant efforts need to be devoted to improving the current practice of experimental evaluation of clustering algorithms to ensure an essential match between algorithms and problems. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 80 REFERENCES
A Method for Comparing Two Hierarchical Clusterings
Abstract This article concerns the derivation and use of a measure of similarity between two hierarchical clusterings. The measure, Bk , is derived from the matching matrix, [mij ], formed by cuttingExpand
C HAMELEON : A Hierarchical Clustering Algorithm Using Dynamic Modeling
TLDR
A novel hierarchical clustering algorithm called C HAMELEON that measures the similarity of two clusters based on a dynamic model and can discover natural clusters that many existing state of the art clustering algorithms fail to find. Expand
Chameleon: Hierarchical Clustering Using Dynamic Modeling
TLDR
Chameleon's key feature is that it accounts for both interconnectivity and closeness in identifying the most similar pair of clusters, which is important for dealing with highly variable clusters. Expand
On Clustering Validation Techniques
TLDR
The fundamental concepts of clustering are introduced while it surveys the widely known clustering algorithms in a comparative way and the issues that are under-addressed by the recent algorithms are illustrated. Expand
Comparing Clusterings by the Variation of Information
  • M. Meila
  • Mathematics, Computer Science
  • COLT
  • 2003
TLDR
This paper proposes an information theoretic criterion for comparing two partitions, or clusterings, of the same data set, called variation of information (VI), which is positive, symmetric and obeys the triangle inequality. Expand
Multidimensional scaling of measures of distance between partitions
Abstract The techniques of multidimensional scaling were used to study the numerical behavior of twelve measures of distance between partitions, as applied to partition lattices of four differentExpand
Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions
TLDR
This paper introduces the problem of combining multiple partitionings of a set of objects into a single consolidated clustering without accessing the features or algorithms that determined these partitionings and proposes three effective and efficient techniques for obtaining high-quality combiners (consensus functions). Expand
Objective Criteria for the Evaluation of Clustering Methods
TLDR
This article proposes several criteria which isolate specific aspects of the performance of a method, such as its retrieval of inherent structure, its sensitivity to resampling and the stability of its results in the light of new data. Expand
Robust data clustering
  • A. Fred, Anil K. Jain
  • Mathematics, Computer Science
  • 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings.
  • 2003
TLDR
It is shown that the proposed technique attempts to optimize the mutual information based criteria, although the optimality is not ensured in all situations, and experimental results show the ability of the technique to identify clusters with arbitrary shapes and sizes. Expand
Asymmetric binary similarity measures
TLDR
A new coefficient, “C”, is introduced which overcomes problems and leads to homogeneous classifications in the sense described above and further general recommendations are made for the use of these coefficients in various contexts. Expand
...
1
2
3
4
5
...