Comparing High-Dimensional Partitions with the Co-clustering Adjusted Rand Index

  title={Comparing High-Dimensional Partitions with the Co-clustering Adjusted Rand Index},
  author={Valerie Robert and Yann Vasseur and Vincent Brault},
  journal={Journal of Classification},
We consider the simultaneous clustering of rows and columns of a matrix and more particularly the ability to measure the agreement between two co-clustering partitions. The new criterion we developed is based on the Adjusted Rand Index and is called the Co-clustering Adjusted Rand Index named CARI. We also suggest new improvements to existing criteria such as the classification error which counts the proportion of misclassified cells and the Extended Normalized Mutual Information criterion… 

A Multi-kernel Semi-supervised Metric Learning Using Multi-objective Optimization Approach

This work divides the original kernel space into multiple kernel sub-spaces so that each kernel can be processed independently and parallelly in advance GPU and kernel semi-supervised metric learning using multi-objective approach is applied on individual kernels parallelly.

Co-clustering of Time-Dependent Data via the Shape Invariant Model

A new co-clustering methodology for grouping individuals and variables simultaneously, designed to handle both functional and longitudinal data, is proposed by embedding the shape invariant model in the latent block model via a suitable modification of the SEM-Gibbs algorithm.

Multi-objective clustering algorithm using particle swarm optimization with crowding distance (MCPSO-CD)

The clustering-based method that utilizes crowding distance (CD) technique to balance the optimality of the objectives in Pareto optimal solution search is proposed, based on the dominance concept and crowding distances mechanism to guarantee survival of the best solution.

Similarity and Dissimilarity Regularized Nonnegative Matrix Factorization for Single-Cell RNA-seq Analysis

A novel clustering method Similarity and Dissimilarity Regularized Nonnegative Matrix Factorization (SDCNMF) that simultaneously impose similarity and dissimilarity constraints on low-dimensional representations is proposed.

Co-clustering of evolving count matrices in pharmacovigilance with the dynamic latent block model

DLBM was not only able to identify clusters that are coherent with retrospective knowledges, in particular for major drug-related crises, but also to detect atypical behaviors, which the health professionals were unaware, demonstrating its potential as a routine tool in pharmacovigilance.

A Prototype based Hybrid Approach to speed-up Kernel FCM-K

  • K. Mrudula
  • Computer Science
    2019 1st International Conference on Electrical, Control and Instrumentation Engineering (ICECIE)
  • 2019
A new prototype based hybrid technique to speed-up KFCM-K for large data sets is proposed and experimental study on several benchmark data sets shows that the proposed method converges in less time when compare to conventional KFCK, but with a negligible deviation in the clustering quality.

Adaptive Total-Variation Regularized Low-Rank Representation for Analyzing Single-Cell RNA-seq Data.

A new single cell data analysis model called Adaptive Total-Variation Regularized Low-Rank Representation (ATV-LRR) is introduced that can detect cell types more effectively and stably and remove cell noise and preserve cell feature details by learning the gradient information of the data.

Using BLE beacons and Machine Learning for Personalized Customer Experience in Smart Cafés

A pervasive environment that utilizes Bluetooth Low Energy beacons in conjunction with unsupervised machine learning to personalize a customer's visit to a coffee shop is proposed.

Machine Learning for Cyber Security: Third International Conference, ML4CS 2020, Guangzhou, China, October 8–10, 2020, Proceedings, Part III

A deep convolutional neural network based image segmentation model is investigated to achieve salt mine recognition, and the efficiency of the investigations on loss value and recognition accuracy is shown.



Comparing clusterings---an information based distance

Information-theoretic co-clustering

This work presents an innovative co-clustering algorithm that monotonically increases the preserved mutual information by intertwining both the row and column clusterings at all stages and demonstrates that the algorithm works well in practice, especially in the presence of sparsity and high-dimensionality.

A Method for Comparing Two Hierarchical Clusterings

The derivation and use of a measure of similarity between two hierarchical clusterings, Bk, is derived from the matching matrix, [mij], formed by cutting the two hierarchical trees and counting the number of matching entries in the k clusters in each tree.

On comparing partitions

Rand (1971) proposed the Rand Index to measure the stability of two partitions of one set of units. Hubert and Arabie (1985) corrected the Rand Index for chance (Adjusted Rand Index). In this paper,

Bayesian Co-clustering

This paper presents Bayesian co-clustering models, that allow a mixed membership in row and column clusters, and proposes a fast variational algorithm for inference and parameter estimation.

Information Theoretic Measures for Clusterings Comparison: Variants, Properties, Normalization and Correction for Chance

An organized study of information theoretic measures for clustering comparison, including several existing popular measures in the literature, as well as some newly proposed ones, and advocates the normalized information distance (NID) as a general measure of choice.

On Similarity Indices and Correction for Chance Agreement

It is shown that out of 28 indices introduced in the literature, there are 22 different ones and even though their values differ for the same clusterings compared, after correcting for agreement attributed to chance only, their values become similar and some of them even become equivalent.

Characterization and evaluation of similarity measures for pairs of clusterings

A paradigm apparatus for the evaluation of clustering comparison techniques is introduced and the proposal of a novel clustering similarity measure, the Measure of Concordance, is proposed, showing that only MoC, Powers’s measure, Lopez and Rajski's measure and various forms of Normalised Mutual Information exhibit the desired behaviour under each of the test scenarios.

Objective Criteria for the Evaluation of Clustering Methods

This article proposes several criteria which isolate specific aspects of the performance of a method, such as its retrieval of inherent structure, its sensitivity to resampling and the stability of its results in the light of new data.

Comparing partitions

The problem of comparing two different partitions of a finite set of objects reappears continually in the clustering literature. We begin by reviewing a well-known measure of partition correspondence