• Corpus ID: 3168278

Using Rank-One Biclusters to Classify Microarray Data

  title={Using Rank-One Biclusters to Classify Microarray Data},
  author={Nasimeh Asgarian and Russell Greiner},
Motivation: A DNA-microarray measures the gene expression levels of tens of thousands of genes for a particular sample, corresponding to some specic experimental condition. [] Key Method We propose a novel algorithm for nding biclusters from the microarray data, based on the best rank-1 matrix approximation, then show how to use these biclusters to classify novel samples. We demonstrate that our method works effectively by comparing its prediction accuracy with that of other classiers , including one based…

Figures and Tables from this paper

Finding large average submatrices in high dimensional data
A statistically motivated biclustering procedure that finds large average submatrices within a given real-valued data matrix and is driven by a Bonferroni-based significance score that effectively trades off between submatrix size and average value is proposed.
Bayesian Bi-clustering Methods with Applications in Computational Biology
A general Bayesian approach in tackling bi-clustering problems in high dimensions is outlined, and three Bayesian bi-Clustering models on categorical data are proposed, which increase in complexities in terms of modeling the distributions of features acrossBi-clusters.
Detection of Low Rank Signals in Noise and Fast Correlation Mining with Applications to Large Biological Data
A new method is proposed, called FastMap, that exploits the discreteness of SNPs, and uses a permutation approach to account for multiple comparisons in the analysis of biomedical data.
Biclustering via Sparse Singular Value Decomposition
Summary Sparse singular value decomposition (SSVD) is proposed as a new exploratory analysis tool for biclustering or identifying interpretable row–column associations within high‐dimensional data
Nonnegative matrix factorization via rank-one downdate
An algorithm called rank-one downdate (R1D) is proposed for computing an NMF that is partly motivated by the singular value decomposition, and establishes a theoretical result that maximizing this objective function corresponds to correctly classifying articles in a nearly separable corpus.
Web-based Supplementary Materials for “ Biclustering via Sparse Singular Value Decomposition ”
This section reports one additional simulation study where the nonzero entries of the true signal matrix are all the same, and investigates how the adaptive lasso weight parameters γ1 and γ2 affect the performance of SSVD.
A novel framework based on biclustering for automatic epileptic seizure detection
It is indicated that the proposed framework could not only automatically detect or predict an epilepsy seizure with high performances with respect to accuracy, robustness and efficiency, but also implicitly provide valuable knowledge for studying the mechanisms of epilepsy.
On the Complexity of Nonnegative Matrix Factorization
An exact version of nonnegative matrix factorization is defined and it is established that it is equivalent to a problem in polyhedral combinatorics; it is NP-hard; and that a polynomial-time local search heuristic exists.
Finding Approximately Rank-One Submatrices with the Nuclear Norm and 퓁1-Norm
This work proposes a convex optimization formulation with the nuclear norm and $\ell_1$-norm to find a large approximately rank-one submatrix of a given nonnegative matrix and establishes conditions under which the optimal solution of the convex formulation has a specific sparse structure.
Best Nonnegative Rank-One Approximations of Tensors
A Positivstellensatz is given for this class of polynomial optimization problems, based on which a globally convergent hierarchy of doubly nonnegative (DNN) relaxations is proposed, and it is shown that this approach is quite promising.


Biclustering algorithms for biological data analysis: a survey
In this comprehensive survey, a large number of existing approaches to biclustering are analyzed, and they are classified in accordance with the type of biclusters they can find, the patterns of bIClusters that are discovered, the methods used to perform the search, the approaches used to evaluate the solution, and the target applications.
Using Machine Learning to Design and Interpret Gene-Expression Microarrays
Microarray technology, the data it produces, and the types of machine learning tasks that naturally arise with these data are described, and additional types of interesting data that recent advances in biotechnology allow biomedical researchers to collect are described.
Spectral biclustering of microarray data: coclustering genes and conditions.
This work develops a method that simultaneously clusters genes and conditions, finding distinctive "checkerboard" patterns in matrices of gene expression data, if they exist, and applies it to a selection of publicly available cancer expression data sets.
Class prediction and discovery using gene microarray and proteomics mass spectroscopy data: curses, caveats, cautions
This work shows for several publicly available microarray and proteomics datasets how the 'curse of dimensionality' and dataset sparsity influence classification outcomes, and suggests an approach to assess the relative quality of apparently equally good classifiers.
A systematic comparison and evaluation of biclustering methods for gene expression data
A methodology for comparing and validating biclustering methods that includes a simple binary reference model that captures the essential features of most bic Lustering approaches and proposes a fast divide-and-conquer algorithm (Bimax).
'Gene shaving' as a method for identifying distinct sets of genes with similar expression patterns
The gene shaving method is a potentially useful tool for exploration of gene expression data and identification of interesting clusters of genes worth further investigation.
Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays.
  • U. Alon, N. Barkai, A. Levine
  • Biology
    Proceedings of the National Academy of Sciences of the United States of America
  • 1999
A two-way clustering method is reported for analyzing a data set consisting of the expression patterns of different cell types, revealing broad coherent patterns that suggest a high degree of organization underlying gene expression in these tissues.
Diagnosis of multiple cancer types by shrunken centroids of gene expression
The method of “nearest shrunken centroids” identifies subsets of genes that best characterize each class, which was highly efficient in finding genes for classifying small round blue cell tumors and leukemias.
Unsupervised Feature Selection Via Two-way Ordering in Gene Expression Analysis
A new method to select relevant genes based on their similarity information only is proposed and studied, which outperforms the baseline algorithm that simply uses all genes, and it also selects relevant genes close to those selected using supervised methods.
Singular value decomposition for genome-wide expression data processing and modeling.
Using singular value decomposition in transforming genome-wide expression data from genes x arrays space to reduced diagonalized "eigengenes" x "eigenarrays" space gives a global picture of the dynamics of gene expression, in which individual genes and arrays appear to be classified into groups of similar regulation and function, or similar cellular state and biological phenotype.