Weighted rank aggregation of cluster validation measures: a Monte Carlo cross-entropy approach
@article{Pihur2007WeightedRA,
title={Weighted rank aggregation of cluster validation measures: a Monte Carlo cross-entropy approach},
author={Vasyl Pihur and Susmita Datta and Somnath Datta},
journal={Bioinformatics},
year={2007},
volume={23 13},
pages={
1607-15
}
}MOTIVATION
Biologists often employ clustering techniques in the explorative phase of microarray data analysis to discover relevant biological groupings. Given the availability of numerous clustering algorithms in the machine-learning literature, an user might want to select one that performs the best for his/her data set or application. While various validation measures have been proposed over the years to judge the quality of clusters produced by a given clustering algorithm including their…
158 Citations
optCluster: An R Package for Determining the Optimal Clustering Algorithm
- Computer ScienceBioinformation
- 2017
The optCluster package is introduced, an R package that uses a single function to simultaneously compare numerous clustering partitions and obtain a “best” option for a given dataset.
Robust rank aggregation for gene list integration and meta-analysis
- Computer ScienceBioinform.
- 2012
This work proposes a novel robust rank aggregation (RRA) method that detects genes that are ranked consistently better than expected under null hypothesis of uncorrelated inputs and assigns a significance score for each gene.
On strategies for building effective ensembles of relative clustering validity criteria
- Computer ScienceKnowledge and Information Systems
- 2015
This paper proposes a method for selecting measures with minimum effectiveness and some degree of complementarity into ensembles, which show superior performance when compared to any single ensemble member (and not just the worst one) over a variety of different datasets.
How Many Clusters: A Validation Index for Arbitrary-Shaped Clusters
- Computer ScienceIEEE/ACM Transactions on Computational Biology and Bioinformatics
- 2013
A new validation index based on graph concepts is presented, which has been designed to find arbitrary shaped clusters by exploiting the spatial layout of the patterns and their clustering label, combined with a solid statistical detection framework, the gap statistic.
RankAggreg, an R package for weighted rank aggregation
- BiologyBMC Bioinformatics
- 2008
The two examples described in the manuscript clearly show the utility of the RankAggreg package in the current bioinformatics context where ordered lists are routinely produced as a result of modern high-throughput technologies.
On the combination of relative clustering validity criteria
- Computer ScienceSSDBM
- 2013
An extensive study on the combination of relative criteria considering both synthetic and real datasets is presented and the shortcomings and possible benefits of combining different relative criteria into a committee are discussed.
Entropy steered Kendall's tau measure for a fair Rank Aggregation
- Computer Science2011 2nd National Conference on Emerging Trends and Applications in Computer Science
- 2011
An important drawback of the Kendall's tau distance is pointed out and a modified measure is proposed by using Shanon's Entropy formula, which explains its benefit through some artificial and real data.
Weighted Markov Chain Based Aggregation of Biomolecule Orderings
- Computer ScienceIEEE/ACM Transactions on Computational Biology and Bioinformatics
- 2012
Effectiveness of the weighted Markov chain approach over the very recently proposed Genetic Algorithm and Cross-Entropy Monte Carlo (MC) algorithm-based techniques, has been established for gene orderings from microarray analysis and orderings of predicted microRNA targets.
Integration of ranked lists via cross entropy Monte Carlo with applications to mRNA and microRNA Studies.
- Computer ScienceBiometrics
- 2009
Formulating the problem of integrating ranked lists as minimizing an objective criterion, this work explores the usage of a cross entropy Monte Carlo method for solving such a combinatorial problem.
References
SHOWING 1-10 OF 32 REFERENCES
Methods for evaluating clustering algorithms for gene expression data using a reference set of functional classes
- Computer ScienceBMC Bioinformatics
- 2006
Functional information of annotated genes available from various GO databases mined using ontology tools can be used to systematically judge the results of an unsupervised clustering algorithm as applied to a gene expression data set in clustering genes.
Comparisons and validation of statistical clustering techniques for microarray gene expression data
- Computer ScienceBioinform.
- 2003
Six clustering algorithms are considered and it is shown that the group means produced by Diana are the closest and those produced by UPGMA are the farthest from a model profile based on a set of hand-picked genes.
Validating clustering for gene expression data
- Computer ScienceBioinform.
- 2001
This work provides a systematic framework for assessing the results of clustering algorithms for gene expression data sets by applying a clustering algorithm to the data from all but one experimental condition.
Exploiting the Trade-off - The Benefits of Multiple Objectives in Data Clustering
- Computer ScienceEMO
- 2005
An advanced multiobjective clustering algorithm, MOCK, with the capacity to identify good solutions from the Pareto front, and to automatically determine the number of clusters in a data set is described.
Evolutionary Multiobjective Clustering
- Computer SciencePPSN
- 2004
The PESA-II EA is adapted for the clustering problem by the incorporation of specialized mutation and initialization procedures, described herein, which exhibits a far more robust level of performance than both the classic k-means and average-link agglomerative clustering algorithms, outperforming them substantially on aggregate.
Computational cluster validation in post-genomic data analysis
- Computer ScienceBioinform.
- 2005
This review paper aims to familiarize the reader with the battery of techniques available for the validation of clustering results, with a particular focus on their application to post-genomic data analysis.
Multiobjective data clustering
- Computer ScienceProceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004.
- 2004
A new clustering approach that uses multiple clustering objective functions simultaneously and includes detection of clusters by a set of candidate objective functions as well as their integration into the target partition.
A hierarchical unsupervised growing neural network for clustering gene expression patterns
- Computer ScienceBioinform.
- 2001
A new approach to the analysis of gene expression data coming from DNA array experiments, using an unsupervised neural network that applies to any data providing that they can be coded as a series of numbers and that a computable measure of similarity between data items can be used.
Model-based Gaussian and non-Gaussian clustering
- Computer Science
- 1993
The classification maximum likelihood approach is sufficiently general to encompass many current clustering algorithms, including those based on the sum of squares criterion and on the criterion of Friedman and Rubin (1967), but it is restricted to Gaussian distributions and it does not allow for noise.
Combinatorial Optimization, Cross-Entropy, Ants and Rare Events
- Mathematics, Computer Science
- 2001
It is shown how to solve network combinatorial optimization problems using a randomized algorithm based on the cross-entropy method, and it is shown that for a finite sample the algorithm converges with very high probability to a very small subset of the optimal values.






