• Corpus ID: 88517403

Integrated Principal Components Analysis

  title={Integrated Principal Components Analysis},
  author={Tiffany M. Tang and Genevera I. Allen},
  journal={J. Mach. Learn. Res.},
Data integration, or the strategic analysis of multiple sources of data simultaneously, can often lead to discoveries that may be hidden in individualistic analyses of a single data source. We develop a new unsupervised data integration method named Integrated Principal Components Analysis (iPCA), which is a model-based generalization of PCA and serves as a practical tool to find and visualize common patterns that occur in multiple data sets. The key idea driving iPCA is the matrix-variate… 

Figures and Tables from this paper

Integrative Generalized Convex Clustering Optimization and Feature Selection for Mixed Multi-View Data
The iGecco+ approach selects features from each data view that are best for determining the groups, often leading to improved integrative clustering, and develops a new type of generalized multi-block ADMM algorithm using sub-problem approximations that more efficiently fits the model for big data sets.
Stepwise Covariance-Free Common Principal Components (CF-CPC) With an Application to Neuroscience
A covariance-free stepwise CPC, which only requires O(kn) memory, where n is the total number of examples, is proposed, which allows extracting the shared anatomical structure of EEG and MEG source spectra across a frequency range of 0.01–40 Hz.
Integrative analysis of multi-omics data improves model predictions: an application to lung cancer
This work shows how an integrative analysis that preserves both components of variation is more appropriate than analyses considering uniquely individual or joint components, and shows how both joint and individual components contribute to a better quality of model predictions, and facilitate the interpretation of the underlying biological processes.
Computationally Efficient Learning of Statistical Manifolds
It is demonstrated how underlying structures in high dimensional data, including anomalies, can be visualized and identified, in a way that is scalable to large datasets, and is robust to different manifold learning algorithms and different approximate nearest neighbor algorithms.
Integrative, multi-omics, analysis of blood samples improves model predictions: applications to cancer
This work identifies joint and individual contributions of DNA methylation, miRNA and mRNA expression collected from blood samples in a lung cancer case–control study nested within the Norwegian Women and Cancer (NOWAC) cohort study, and uses such components to build prediction models for case– control and metastatic status.
Stacked Autoencoder Based Multi-Omics Data Integration for Cancer Survival Prediction
This paper proposes a novel method to integrate multi-omics data for cancer survival prediction, called Stacked AutoEncoder-based Survival Prediction Neural Network (SAEsurv-net), and addresses the curse of dimensionality with a two-stage dimensionality reduction strategy and handlesMulti-omics heterogeneity with a stacked autoencoder model.
No-go Theorem for Acceleration in the Hyperbolic Plane
It is proved that in a noisy setting, there is no analogue of accelerated gradient descent for geodesically convex functions on the hyperbolic plane.
Principal Components Along Quiver Representations
Quiver representations arise naturally in many areas across mathematics. Here we describe an algorithm for calculating the vector space of sections, or compatible assignments of vectors to vertices,
A No-go Theorem for Robust Acceleration in the Hyperbolic Plane
In recent years there has been significant effort to adapt the key tools and ideas in convex optimization to the Riemannian setting. One key challenge has remained: Is there a Nesterov-like


Distributed estimation of principal eigenspaces.
It is shown that when the number of machines is not unreasonably large, the distributed PCA performs as well as the whole sample PCA, even without full access of whole data.
JIVE quantifies the amount of joint variation between data types, reduces the dimensionality of the data, and provides new directions for the visual exploration of joint and individual structure.
Simulations and results on microarray data and the Netflix data show that these imputation techniques often outperform existing methods and offer a greater degree of flexibility.
Multiple factor analysis: principal component analysis for multitable and multiblock data sets
This article presents MFA, reviews recent extensions, and illustrates it with a detailed example that shows the common factor scores could be obtained by replacing the original normalized data tables by the normalized factor scores obtained from the PCA of each of these tables.
Robust Kronecker Product PCA for Spatio-Temporal Covariance Estimation
A robust PCA-based algorithm is introduced to estimate the covariance under the Kronecker PCA model, and an extension to Toeplitz temporal factors is provided, producing a parameter reduction for temporally stationary measurement modeling.
Structure-revealing data fusion
A structure-revealing data fusion model that can jointly analyze heterogeneous, incomplete data sets with shared and unshared components is proposed and its promising performance as well as potential limitations on both simulated and real data are demonstrated.
Sparse permutation invariant covariance estimation
A method for constructing a sparse estimator for the inverse covariance (concentration) matrix in high-dimensional settings using a penalized normal likelihood approach and forces sparsity by using a lasso-type penalty is proposed.
Orthogonal Sparse PCA and Covariance Estimation via Procrustes Reformulation
Numerical experiments show that the proposed eigenvector extraction algorithm outperforms existing algorithms in terms of support recovery and explained variance, whereas the covariance estimation algorithms improve the sample covariance estimator significantly.
Gemini: Graph estimation with matrix variate normal instances
This paper develops new methods for estimating the graphical structures and underlying parameters, namely, the row and column covariance and inverse covariance matrices from the matrix variate data and provides simulation evidence showing that one can recover graphical structures as well as estimating the precision matrices, as predicted by theory.
Analysis of multiblock and hierarchical PCA and PLS models
It is recommended that in cases where the variables can be separated into meaningful blocks, the standard PCA and PLS methods be used to build the models and then the weights and loadings of the individual blocks and super block and the percentage variation explained in each block be calculated from the results.