Meta Sparse Principal Component Analysis

  title={Meta Sparse Principal Component Analysis},
  author={Imon Banerjee and Jean Honorio},
We study the meta-learning for support (i.e. the set of non-zero entries) recovery in high-dimensional Principal Component Analysis. We reduce the sufficient sample complexity in a novel task with the information that is learned from auxiliary tasks. We assume each task to be a different random Principal Component (PC) matrix with a possibly different support and that the support union of the PC matrices is small. We then pool the data from all the tasks to execute an improper estimation of a… 

Figures and Tables from this paper

Meta Learning for Support Recovery in High-dimensional Precision Matrix Estimation

This paper proposes to pool all the samples from different tasks, and proposes to estimate a single precision matrix by minimizing the $\ell_1$-regularized log-determinant Bregman divergence, and proves a matching information-theoretic lower bound for the necessary number of samples.

The Sample Complexity of Meta Sparse Regression

This paper addresses the meta-learning problem in sparse linear regression with infinite tasks and allows for l to be constant with respect to T (i.e., few-shot learning), and proves that the rates are minimax optimal.

High-dimensional analysis of semidefinite relaxations for sparse principal components

This paper analyzes a simple and computationally inexpensive diagonal cut-off method, and establishes a threshold of the order thetasdiag = n/[k2 log(p-k)] separating success from failure, and proves that a more complex semidefinite programming (SDP) relaxation due to dpsilaAspremont et al., succeeds once the sample size is of theorder thetassdp.

On Consistency and Sparsity for Principal Components Analysis in High Dimensions

  • I. JohnstoneA. Lu
  • Computer Science, Mathematics
    Journal of the American Statistical Association
  • 2009
A simple algorithm for selecting a subset of coordinates with largest sample variances is provided, and it is shown that if PCA is done on the selected subset, then consistency is recovered, even if p(n) ≫ n.

Sparse Principal Component Analysis with Missing Observations

The first information-theoretic lower bound for the sparse PCA problem with missing observations is established and the properties of a BIC type estimator that does not require any prior knowledge on the sparsity of the unknown first principal component or any imputation of the missing observations are studied.


Even though the Lasso cannot recover the correct sparsity pattern, the estimator is still consistent in the ‘2-norm sense for fixed designs under conditions on (a) the number sn of non-zero components of the vector n and (b) the minimal singular values of the design matrices that are induced by selecting of order sn variables.

Fantope Projection and Selection: A near-optimal convex relaxation of sparse PCA

A novel convex relaxation of sparse principal subspace estimation based on the convex hull of rank-d projection matrices (the Fantope) is proposed and implies the near-optimality of DSPCA (d'Aspremont et al. [1]) even when the solution is not rank 1.

Finite sample approximation results for principal component analysis: a matrix perturbation approach

A matrix perturbation view of the "phase transition phenomenon," and a simple linear-algebra based derivation of the eigenvalue and eigenvector overlap in this asymptotic limit of finite sample PCA are presented.

Sparse principal component analysis by choice of norm

Sparse Principal Component Analysis and Iterative Thresholding

Under a spiked covariance model, a new iterative thresholding approach for estimating principal subspaces in the setting where the leading eigenvectors are sparse is proposed and it is found that the new approach recovers the principal subspace and leading eignevectors consistently, and even optimally, in a range of high-dimensional sparse settings.