• Corpus ID: 221586094

Biclustering with Alternating K-Means

  title={Biclustering with Alternating K-Means},
  author={Nicolas Fraiman and Zichao Li},
Biclustering is the task of simultaneously clustering the rows and columns of the data matrix into different subgroups such that the rows and columns within a subgroup exhibit similar patterns. In this paper, we consider the case of producing exclusive row and column biclusters. We provide a new formulation of the biclustering problem based on the idea of minimizing the empirical clustering risk. We develop and prove a consistency result with respect to the empirical clustering risk. Since the… 

Figures and Tables from this paper

Kernel Biclustering algorithm in Hilbert Spaces

A new model-free biclustering algorithm in abstract spaces using the notions of energy distance and maximum mean discrepancy – two distances between probability distributions capable of handling complex data such as curves or graphs is developed.

Optimal Variable Clustering for High-Dimensional Matrix Valued Data

This work proposes a new latent variable model for the features arranged in matrix form, with some unknown membership matrices representing the clusters for the rows and columns and establishes the minimax lower bound for clustering under this model.

Classification with Nearest Disjoint Centroids

The results demonstrate that the nearest disjoint centroid classifier is able to outperform other competing classifiers by having smaller misclassification rates and/or using fewer features in various settings and situations.



Profile likelihood biclustering

A new heuristic optimization procedure based on the Kernighan-Lin heuristic, which has nice computational properties and performs well in simulations, is proposed and proved that the procedure recovers the true row and column classes when the dimensions of the data matrix tend to infinity.

Sparse Biclustering of Transposable Data

  • Kean Ming TanD. Witten
  • Computer Science
    Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America
  • 2014
This paper proposes a framework for biclustering based on the matrix-variate normal distribution and shows that k-means clustering of the rows and of the columns of a data matrix can be seen as special cases of this proposal and that a relaxation of the proposal yields the singular value decomposition.

Convex biclustering

This work presents a convex formulation of the biclustering problem that possesses a unique global minimizer and an iterative algorithm, COBRA, that is guaranteed to identify it and demonstrates the advantages of the approach, which includes stably and reproducibly identifying biclusterings, on simulated and real microarray data.

Robust biclustering by sparse singular value decomposition incorporating stability selection

The S4VD algorithm is proposed to incorporate stability selection to improve this method, which is the first biclustering approach that takes the cluster stability regarding perturbations of the data into account.

Biclustering algorithms for biological data analysis: a survey

In this comprehensive survey, a large number of existing approaches to biclustering are analyzed, and they are classified in accordance with the type of biclusters they can find, the patterns of bIClusters that are discovered, the methods used to perform the search, the approaches used to evaluate the solution, and the target applications.

Finding large average submatrices in high dimensional data

A statistically motivated biclustering procedure that finds large average submatrices within a given real-valued data matrix and is driven by a Bonferroni-based significance score that effectively trades off between submatrix size and average value is proposed.

QUBIC: a qualitative biclustering algorithm for analyses of gene expression data

A QUalitative BIClustering algorithm (QUBIC) that can solve the biclustering problem in a more general form, compared to existing algorithms, through employing a combination of qualitative (or semi-quantitative) measures of gene expression data and a combinatorial optimization technique.

Biclustering microarray data by Gibbs sampling

A simple probabilistic model of the biclusters is chosen because it has the key advantage of providing an easily interpretable fingerprint and does not suffer from the problem of local minima that often characterizes Expectation-Maximization.

Enhanced biclustering on expression data

The model of bicluster is generalized to incorporate null values and a probabilistic algorithm (FLOC) is proposed that can discover a set of k possibly overlapping biclusters simultaneously and can be extended to support additional features that suit different requirements at virtually little cost.

Discovering statistically significant biclusters in gene expression data

A new method to detect significant biclusters in large expression datasets is proposed and is able to detect and relate finer tissue types than was previously possible in cancer data and outperforms the biclustering algorithm of Cheng and Church (2000).