How to Reduce Dimension With PCA and Random Projections?

  title={How to Reduce Dimension With PCA and Random Projections?},
  author={Fan Yang and Sifan Liu and E. Dobriban and David P. Woodruff},
  journal={IEEE Transactions on Information Theory},
In our “big data” age, the size and complexity of data is steadily increasing. Methods for dimension reduction are ever more popular and useful. Two distinct types of dimension reduction are “data-oblivious” methods such as random projections and sketching, and “data-aware” methods such as principal component analysis (PCA). Both have their strengths, such as speed for random projections, and data-adaptivity for PCA. In this work, we study how to combine them to get the best of both. We study… Expand

Figures and Tables from this paper

Precise expressions for random projections: Low-rank approximation and randomized Newton
This work exploits recent developments in the spectral analysis of random matrices to develop novel techniques that provide provably accurate expressions for the expected value of random projection matrices obtained via sketching, and enables precise analysis of these methods in terms of spectral properties of the data. Expand
Scaling Graph Clustering with Distributed Sketches
This work presents a method inspired by spectral clustering where matrix sketches derived from random dimension-reducing projections produce embeddings that yield performant clustering results given a fully-dynamic stochastic block model stream using both the fast Johnson-Lindenstrauss and CountSketch transforms. Expand
An Introduction to Johnson-Lindenstrauss Transforms
Johnson–Lindenstrauss Transforms are powerful tools for reducing the dimensionality of data while preserving key characteristics of that data, and they have found use in many fields from machineExpand
Sample canonical correlation coefficients of high-dimensional random vectors with finite rank correlations
Consider two random vectors r x P R and r y P R of the forms r x “ Az ` C 1 x and r y “ Bz ` C 2 y, where x P R, y P R and z P R are independent random vectors with i.i.d. entries of zero mean andExpand
Local laws for multiplication of random matrices and spiked invariant model
High dimensional Haar random matrices are common objects in modern statistical learning theory. We consider the random matrix model $A^{1/2} UBU^* A^{1/2},$ where $A$ and $B$ are two $N \times N$Expand
Sparse sketches with small inversion bias
A framework for analyzing inversion bias is developed, based on the proposed concept of an $(\epsilon,\delta)$-unbiased estimator for random matrices, and a new sketching technique is proposed, called LEverage Score Sparsified (LESS) embeddings, which uses ideas from both data-oblivious sparseembeddings as well as data-aware leverage-based row sampling methods. Expand
The Product of Gaussian Matrices is Close to Gaussian
It is shown that, provided each di, i, satisfies di ≥ Cp · q, where C ≥ C0 for a constant C0 > 0 depending on r, then the matrix product G1G2 · · ·Gr has variation distance at most δ to a p × q matrix G of i.d. Expand
Dimensionality reduction, regularization, and generalization in overparameterized regressions
It is shown that OLS is arbitrarily susceptible to data-poisoning attacks in the overparameterization regime -- unlike the underparameterized regime -- and that regularization and dimensionality reduction improve the robustness. Expand
Limiting Spectrum of Randomized Hadamard Transform and Optimal Iterative Sketching Methods
The limiting spectrum and asymptotic freeness of random matrices are leveraged to obtain an exact analysis of iterative sketching methods for solving least squares problems and yield optimal step-sizes and convergence rates. Expand
Optimal Iterative Sketching with the Subsampled Randomized Hadamard Transform
This work studies the performance of iterative Hessian sketch for least-squares problems by leveraging and extending recent results from random matrix theory on the limiting spectrum of matrices randomly projected with the subsampled randomized Hadamard transform, and truncated Haar matrices. Expand


Asymptotics for Sketching in Least Squares Regression
The limits of the accuracy loss (for estimation and test error) incurred by popular sketching methods are found, and separation between different methods is shown, so that SRHT is better than Gaussian projections. Expand
Improved Approximation Algorithms for Large Matrices via Random Projections
  • Tamás Sarlós
  • Mathematics, Computer Science
  • 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06)
  • 2006
The key idea is that low dimensional embeddings can be used to eliminate data dependence and provide more versatile, linear time pass efficient matrix computation. Expand
Randomized sketches of convex programs with sharp guarantees
This work analyzes RP-based approximations of convex programs, in which the original optimization problem is approximated by the solution of a lower-dimensional problem, and proves that the approximation ratio of this procedure can be bounded in terms of the geometry of constraint set. Expand
Randomized Matrix Decompositions using R
This work presents the R package rsvd, and provides a tutorial introduction to randomized matrix decompositions, showing the computational advantage over other methods implemented in R for approximating matrices with low-rank structure. Expand
Linear dimensionality reduction: survey, insights, and generalizations
This survey and generic solver suggest that linear dimensionality reduction can move toward becoming a blackbox, objective-agnostic numerical technology. Expand
An Algorithm for the Principal Component Analysis of Large Data Sets
This work adapts one of these randomized methods for principal component analysis (PCA) for use with data sets that are too large to be stored in random-access memory (RAM), and reports on the performance of the algorithm. Expand
Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions
This work surveys and extends recent research which demonstrates that randomization offers a powerful tool for performing low-rank matrix approximation, and presents a modular framework for constructing randomized algorithms that compute partial matrix decompositions. Expand
Permutation methods for factor analysis and PCA
Researchers often have datasets measuring features $x_{ij}$ of samples, such as test scores of students. In factor analysis and PCA, these features are thought to be influenced by unobserved factors,Expand
Random‐projection ensemble classification
Summary We introduce a very general method for high dimensional classification, based on careful combination of the results of applying an arbitrary base classifier to random projections of theExpand
A Statistical Perspective on Randomized Sketching for Ordinary Least-Squares
A framework for assessing algorithmic and statistical aspects of randomized sketching methods is developed; the statistical prediction efficiency (PE) and the statistical residual efficiency (RE) of the sketched LS estimator are considered; and upper bounds for several types of random projection and random sampling sketching algorithms are provided. Expand