Comparing High-Dimensional Partitions with the Co-clustering Adjusted Rand Index

  title={Comparing High-Dimensional Partitions with the Co-clustering Adjusted Rand Index},
  author={Valerie Robert and Yann Vasseur and Vincent Brault},
  journal={Journal of Classification},
We consider the simultaneous clustering of rows and columns of a matrix and more particularly the ability to measure the agreement between two co-clustering partitions. The new criterion we developed is based on the Adjusted Rand Index and is called the Co-clustering Adjusted Rand Index named CARI. We also suggest new improvements to existing criteria such as the classification error which counts the proportion of misclassified cells and the Extended Normalized Mutual Information criterion… 

Semi-supervised Latent Block Model with pairwise constraints

This paper presents a general probabilistic framework for incorporating must link and cannot link relationships in the LBM based on Hidden Markov Random Fields and instantiates this framework on a model for count data and presents two inference algorithms based on Variational and Classification EM.

A Multi-kernel Semi-supervised Metric Learning Using Multi-objective Optimization Approach

This work divides the original kernel space into multiple kernel sub-spaces so that each kernel can be processed independently and parallelly in advance GPU and kernel semi-supervised metric learning using multi-objective approach is applied on individual kernels parallelly.

Co-clustering of evolving count matrices with the dynamic latent block model: application to pharmacovigilance

The dynamic latent block model (dLBM) is proposed, which extends the classical binary latent block models, making amenable such analysis to dynamic cases where data are counts, and operates on temporal count matrices allowing to detect abrupt changes in the way existing clusters interact with each other.

Co-clustering of Time-Dependent Data via the Shape Invariant Model

A new co-clustering methodology for grouping individuals and variables simultaneously, designed to handle both functional and longitudinal data, is proposed by embedding the shape invariant model in the latent block model via a suitable modification of the SEM-Gibbs algorithm.

Multi-objective clustering algorithm using particle swarm optimization with crowding distance (MCPSO-CD)

The clustering-based method that utilizes crowding distance (CD) technique to balance the optimality of the objectives in Pareto optimal solution search is proposed, based on the dominance concept and crowding distances mechanism to guarantee survival of the best solution.

Similarity and Dissimilarity Regularized Nonnegative Matrix Factorization for Single-Cell RNA-seq Analysis

A novel clustering method Similarity and Dissimilarity Regularized Nonnegative Matrix Factorization (SDCNMF) that simultaneously impose similarity and dissimilarity constraints on low-dimensional representations is proposed.

Similarity and Dissimilarity Regularized Nonnegative Matrix Factorization for Single-Cell RNA-seq Analysis

A novel clustering method that simultaneously impose similarity and dissimilarity constraints on low-dimensional representations that is better than other comparative methods, and the gene markers found are also consistent with previous studies are concluded.

Goodness-of-fit Test for Latent Block Models

Co-clustering of evolving count matrices in pharmacovigilance with the dynamic latent block model

DLBM was not only able to identify clusters that are coherent with retrospective knowledges, in particular for major drug-related crises, but also to detect atypical behaviors, which the health professionals were unaware, demonstrating its potential as a routine tool in pharmacovigilance.



Comparing clusterings---an information based distance

Information-theoretic co-clustering

This work presents an innovative co-clustering algorithm that monotonically increases the preserved mutual information by intertwining both the row and column clusterings at all stages and demonstrates that the algorithm works well in practice, especially in the presence of sparsity and high-dimensionality.

A Method for Comparing Two Hierarchical Clusterings

The derivation and use of a measure of similarity between two hierarchical clusterings, Bk, is derived from the matching matrix, [mij], formed by cutting the two hierarchical trees and counting the number of matching entries in the k clusters in each tree.

On comparing partitions

Rand (1971) proposed the Rand Index to measure the stability of two partitions of one set of units. Hubert and Arabie (1985) corrected the Rand Index for chance (Adjusted Rand Index). In this paper,

Bayesian Co-clustering

This paper presents Bayesian co-clustering models, that allow a mixed membership in row and column clusters, and proposes a fast variational algorithm for inference and parameter estimation.

Information Theoretic Measures for Clusterings Comparison: Variants, Properties, Normalization and Correction for Chance

An organized study of information theoretic measures for clustering comparison, including several existing popular measures in the literature, as well as some newly proposed ones, and advocates the normalized information distance (NID) as a general measure of choice.

On Similarity Indices and Correction for Chance Agreement

It is shown that out of 28 indices introduced in the literature, there are 22 different ones and even though their values differ for the same clusterings compared, after correcting for agreement attributed to chance only, their values become similar and some of them even become equivalent.

Characterization and evaluation of similarity measures for pairs of clusterings

A paradigm apparatus for the evaluation of clustering comparison techniques is introduced and the proposal of a novel clustering similarity measure, the Measure of Concordance, is proposed, showing that only MoC, Powers’s measure, Lopez and Rajski's measure and various forms of Normalised Mutual Information exhibit the desired behaviour under each of the test scenarios.

Objective Criteria for the Evaluation of Clustering Methods

This article proposes several criteria which isolate specific aspects of the performance of a method, such as its retrieval of inherent structure, its sensitivity to resampling and the stability of its results in the light of new data.

Comparing partitions

The problem of comparing two different partitions of a finite set of objects reappears continually in the clustering literature. We begin by reviewing a well-known measure of partition correspondence