• Corpus ID: 236635017

Finding Stable Groups of Cross-Correlated Features in Two Data Sets With Common Samples

  title={Finding Stable Groups of Cross-Correlated Features in Two Data Sets With Common Samples},
  author={Andrew B. Nobel},
Data sets in which measurements of different types are obtained from a common set of samples appear in many scientific applications. In the analysis of such data, an important problem is to identify groups of features from different data types that are strongly associated. Given two data types, a bimodule is a pair (A,B) of feature sets from the two types such that the aggregate cross-correlation between the features in A and those in B is large. A bimodule (A,B) is stable if A coincides with… 

Figures and Tables from this paper



Finding large average submatrices in high dimensional data

A statistically motivated biclustering procedure that finds large average submatrices within a given real-valued data matrix and is driven by a Bonferroni-based significance score that effectively trades off between submatrix size and average value is proposed.

Comparison and evaluation of integrative methods for the analysis of multilevel omics data: a study based on simulated and experimental cancer data

This paper presents a comprehensive comparison of three integrative analysis approaches, sparse canonical correlation analysis (sCCA), non-negative matrix factorization (NMF) and logic data mining MicroArray Logic Analyzer (MALA), by applying them to simulated and experimental omics data and shows that MALA performs best in terms of sample classification accuracy.

Multi-omics integration - a comparison of unsupervised clustering methodologies

The impact of factors such as data preprocessing, choice of the integration method and the number of different omics considered had increased are explored when solving the problem of sample classification by comparing the performances of five unsupervised algorithms.

The Elements of Statistical Learning: Data Mining, Inference, and Prediction

This book is a valuable resource, both for the statistician needing an introduction to machine learning and related Ž elds and for the computer scientist wishing to learn more about statistics, and statisticians will especially appreciate that it is written in their own language.


The co-inertia criterion for measuring the adequacy between two data sets is presented and can be easily extended to the cases of distance matrices or to the case of more than two tables.

Consistency and overfitting of multi-omics methods on experimental data

A comparison of sparse multiple canonical correlation analysis (Sparse mCCA), angle-based joint and individual variation explained (AJIVE) and multi-omics factor analysis (MOFA) using a cross-validation approach to assess overfitting and consistency is presented.

Multitable Methods for Microbiome Data Integration

The purpose here is to distill relevant themes across different analysis approaches and provide concrete workflows for approaching analysis, as a function of ultimate analysis goals and data characteristics (heterogeneity, dimensionality, sparsity).

Multimodal Data Fusion: An Overview of Methods, Challenges, and Prospects

The aim of this paper is to provide the reader with a taste of the vastness of the field, the prospects, and the opportunities that it holds, and a number of data-driven solutions based on matrix and tensor decompositions are discussed, emphasizing how they account for diversity across the data sets.

Dimension reduction techniques for the integrative analysis of multi-omics data

This work explores dimension reduction techniques as one of the emerging approaches for data integration, and how these can be applied to increase the understanding of biological systems in normal physiological function and disease.

Spatial Pattern Analysis of Functional Brain Images Using Partial Least Squares

Partial least squares serves as an important extension by extracting new information from imaging data that is not accessible through other currently used univariate and multivariate image analysis tools.