Sparse Principal Component Analysis with Missing Observations

  title={Sparse Principal Component Analysis with Missing Observations},
  author={Karim Lounici},
  journal={arXiv: Statistics Theory},
  • Karim Lounici
  • Published 31 May 2012
  • Mathematics, Computer Science
  • arXiv: Statistics Theory
In this paper, we study the problem of sparse Principal Component Analysis (PCA) in the high dimensional setting with missing observations. Our goal is to estimate the first principal component when we only have access to partial observations. Existing estimation techniques are usually derived for fully observed data sets and require a prior knowledge of the sparsity of the first principal component in order to achieve good statistical guarantees. Our contributions is essentially theoretical in… 
Sparse PCA with Oracle Property
It is proved that, another estimator within the family achieves a sharper statistical rate of convergence than the standard semidefinite relaxation of sparse PCA, even when the previous assumption on the magnitude of the projection matrix is violated.
High-dimensional principal component analysis with heterogeneous missingness
An incoherence condition on the principal components is introduced and it is proved that in the noiseless case, the error of primePCA converges to zero at a geometric rate when the signal strength is not too small.
We study sparse principal components analysis in high dimensions, where p (the number of variables) can be much larger than n (the number of observations), and analyze the problem of estimating the
Sparse spectral estimation with missing and corrupted measurements
A convex method for sparse subspace estimation is extended to the case of missing and corrupted measurements by correcting the bias instead of imputing the missing values to improve the overall statistical performance.
Sparse PCA: Optimal rates and adaptive estimation
Under mild technical conditions, this paper establishes the optimal rates of convergence for estimating the principal subspace which are sharp with respect to all the parameters, thus providing a complete characterization of the difficulty of the estimation problem in term of the convergence rate.
Online sparse and orthogonal subspace estimation from partial information
  • Pengyu Xiao, L. Balzano
  • Computer Science
    2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton)
  • 2016
This work considers an online version of the sparse PCA problem with missing data in which they seek a set of sparse orthogonal basis vectors and proposes two different algorithms for solving this problem, where the main idea is to find a rotation matrix such that the subspace basis is sparse after rotation.
Noisy Matrix Completion Under Sparse Factor Models
This paper examines a general class of noisy matrix completion tasks, where the goal is to estimate a matrix from observations obtained at a subset of its entries, each of which is subject to random
Sparsistency and agnostic inference in sparse PCA
The properties of the recently proposed Fantope projection and selection (FPS) method in the high-dimensional setting are investigated and it is shown that FPS provides a sparse, linear dimension-reducing transformation that is close to the best possible in terms of maximizing the predictive covariance.
Minimax rate-optimal estimation of high-dimensional covariance matrices with incomplete data
A Literature Survey on High-Dimensional Sparse Principal Component Analysis
A comprehensive literatures review to recent progress in highdimensional sparse PCA from algorithm and statistical theory is given and the future trends as well as challenges are given.


Generalized Power Method for Sparse Principal Component Analysis
A new approach to sparse principal component analysis (sparse PCA) aimed at extracting a single sparse dominant principal component of a data matrix, or more components at once, respectively is developed.
On Consistency and Sparsity for Principal Components Analysis in High Dimensions
  • I. Johnstone, A. Lu
  • Computer Science, Mathematics
    Journal of the American Statistical Association
  • 2009
A simple algorithm for selecting a subset of coordinates with largest sample variances is provided, and it is shown that if PCA is done on the selected subset, then consistency is recovered, even if p(n) ≫ n.
Augmented sparse principal component analysis for high dimensional data
This work studies the problem of estimating the leading eigenvectors of a high-dimensional population covariance matrix based on independent Gaussian observations and proposes an estimator based on a coordinate selection scheme combined with PCA that achieves the optimal rate of convergence under a sparsity regime.
Sparse Principal Component Analysis and Iterative Thresholding
Under a spiked covariance model, a new iterative thresholding approach for estimating principal subspaces in the setting where the leading eigenvectors are sparse is proposed and it is found that the new approach recovers the principal subspace and leading eignevectors consistently, and even optimally, in a range of high-dimensional sparse settings.
High-dimensional covariance matrix estimation with missing observations
This paper establishes non-asymptotic sparsity oracle inequalities for the estimation of the covariance matrix with the Frobenius and spectral norms, valid for any setting of the sample size and the dimension of the observations.
Optimal Solutions for Sparse Principal Component Analysis
A new semidefinite relaxation is formulated and a greedy algorithm is derived that computes a full set of good solutions for all target numbers of non zero coefficients, with total complexity O(n3), where n is the number of variables.
Sparse Principal Component Analysis
This work introduces a new method called sparse principal component analysis (SPCA) using the lasso (elastic net) to produce modified principal components with sparse loadings and shows that PCA can be formulated as a regression-type optimization problem.
Optimal detection of sparse principal components in high dimension
The minimax optimal test is based on a sparse eigenvalue statistic, and a computationally efficient alternative test using convex relaxations is described, which is proved to detect sparse principal components at near optimal detection levels and performs well on simulated datasets.
Finite sample approximation results for principal component analysis: a matrix perturbation approach
A matrix perturbation view of the "phase transition phenomenon," and a simple linear-algebra based derivation of the eigenvalue and eigenvector overlap in this asymptotic limit of finite sample PCA are presented.