• Corpus ID: 236635017

Finding Stable Groups of Cross-Correlated Features in Two Data Sets With Common Samples

  title={Finding Stable Groups of Cross-Correlated Features in Two Data Sets With Common Samples},
  author={Miheer Dewaskar and John Palowitch and Mark He and Michael I. Love and Andrew B. Nobel},
Data sets in which measurements of different types are obtained from a common set of samples appear in many scientific applications. In the analysis of such data, an important problem is to identify groups of features from different data types that are strongly associated. Given two data types, a bimodule is a pair (A,B) of feature sets from the two types such that the aggregate cross-correlation between the features in A and those in B is large. A bimodule (A,B) is stable if A coincides with… 

Figures and Tables from this paper


Finding large average submatrices in high dimensional data
A statistically motivated biclustering procedure that finds large average submatrices within a given real-valued data matrix and is driven by a Bonferroni-based significance score that effectively trades off between submatrix size and average value is proposed.
Comparison and evaluation of integrative methods for the analysis of multilevel omics data: a study based on simulated and experimental cancer data
This paper presents a comprehensive comparison of three integrative analysis approaches, sparse canonical correlation analysis (sCCA), non-negative matrix factorization (NMF) and logic data mining MicroArray Logic Analyzer (MALA), by applying them to simulated and experimental omics data and shows that MALA performs best in terms of sample classification accuracy.
Multi-omics integration - a comparison of unsupervised clustering methodologies
The impact of factors such as data preprocessing, choice of the integration method and the number of different omics considered had increased are explored when solving the problem of sample classification by comparing the performances of five unsupervised algorithms.
The co-inertia criterion for measuring the adequacy between two data sets is presented and can be easily extended to the cases of distance matrices or to the case of more than two tables.
Consistency and overfitting of multi-omics methods on experimental data
A comparison of sparse multiple canonical correlation analysis (Sparse mCCA), angle-based joint and individual variation explained (AJIVE) and multi-omics factor analysis (MOFA) using a cross-validation approach to assess overfitting and consistency is presented.
Multitable Methods for Microbiome Data Integration
The purpose here is to distill relevant themes across different analysis approaches and provide concrete workflows for approaching analysis, as a function of ultimate analysis goals and data characteristics (heterogeneity, dimensionality, sparsity).
Multimodal Data Fusion: An Overview of Methods, Challenges, and Prospects
The aim of this paper is to provide the reader with a taste of the vastness of the field, the prospects, and the opportunities that it holds, and a number of data-driven solutions based on matrix and tensor decompositions are discussed, emphasizing how they account for diversity across the data sets.
Dimension reduction techniques for the integrative analysis of multi-omics data
This work explores dimension reduction techniques as one of the emerging approaches for data integration, and how these can be applied to increase the understanding of biological systems in normal physiological function and disease.
Exploring regulation in tissues with eQTL networks
It is shown, in 13 tissues, that these eQTL networks are organized into dense, highly modular communities grouping genes often involved in coherent biological processes, which provide unique insight into the genotype–phenotype relationship.
Align human interactome with phenome to identify causative genes and networks underlying disease families
This work performs the first heterogeneous alignment of human interactome and phenome via a network alignment technique and proposes AlignPI, an alignment-based framework to predict disease genes, and identify plausible candidates for 70 diseases.