• Corpus ID: 231846710

Streaming k-PCA: Efficient guarantees for Oja's algorithm, beyond rank-one updates

@inproceedings{Huang2021StreamingKE,
  title={Streaming k-PCA: Efficient guarantees for Oja's algorithm, beyond rank-one updates},
  author={De Huang and Jonathan Niles-Weed and Rachel A. Ward},
  booktitle={Annual Conference Computational Learning Theory},
  year={2021}
}
We analyze Oja’s algorithm for streaming k-PCA, and prove that it achieves performance nearly matching that of an optimal offline algorithm. Given access to a sequence of i.i.d. d× d symmetric matrices, we show that Oja’s algorithm can obtain an accurate approximation to the subspace of the top k eigenvectors of their expectation using a number of samples that scales polylogarithmically with d. Previously, such a result was only known in the case where the updates have rank one. Our analysis is… 

On the equivalence of Oja's algorithm and GROUSE

It is shown that the Grassmannian Rank-One Subspace Estimation (GROUSE) algorithm is indeed equivalent to Oja’s algorithm in the sense that, at each iteration, given a step size for one of the algorithms, it may construct a step sizes for the other algorithm that results in an identical update.

On the Optimality of the Oja's Algorithm for Online PCA

It is proved that with high probability Oja’s algorithm performs an efficient, gap-free, global convergence rate to approximate an principal component subspace for any sub-Gaussian distribution.

Bootstrapping the Error of Oja's Algorithm

A weighted χ 2 approximation result is established for the sin 2 error between the population eigenvector and the output of Oja’s algorithm, thereby establishing the bootstrap as a consistent inferential method in an appropriate asymptotic regime.

Stochastic Gauss-Newton Algorithms for Online PCA

A stochastic Gauss-Newton (SGN) algorithm to study the online principal component analysis (OPCA) problem, which is formulated by using the symmetric low-rank product model for dominant eigenspace calculation, is proposed.

Robust Streaming PCA

This work considers streaming principal component analysis when the stochastic data-generating model is subject to perturbations and provides fundamental limits on convergence of any algorithm recovering principal components.

Preference Dynamics Under Personalized Recommendations

This work shows how to design a content recommendations which can achieve approximate stationarity, under mild conditions on the set of available content, when a user's preferences are known, and how one can learn enough about a users' preferences to implement such a strategy even when user preferences are initially unknown.

On the Correspondence between Gaussian Processes and Geometric Harmonics

The correspondence between Gaussian process regression and Geometric Harmonics is discussed, providing alternative interpretations of uncertainty in terms of error estimation, or leading towards accelerated Bayesian Optimization due to dimensionality reduction.

References

SHOWING 1-10 OF 35 REFERENCES

First Efficient Convergence for Streaming k-PCA: A Global, Gap-Free, and Near-Optimal Rate

The results match the information theoretic lower bound in terms of dependency on error, on eigengap, on rank k, and on dimension d, up to poly-log factors.

Rivalry of Two Families of Algorithms for Memory-Restricted Streaming PCA

This paper analyzes the convergence rate of a representative algorithm with decayed learning rate (Oja and Karhunen, 1985) in the first family for the general $k>1$ case and proposes a novel algorithm for the second family that sets the block sizes automatically and dynamically with faster convergence rate.

AdaOja: Adaptive Learning Rates for Streaming PCA

AdaOja is a novel variation of the Adagrad algorithm to Oja's algorithm in the single eigen vector case and extended to the multiple eigenvector case and it is demonstrated for dense synthetic data, sparse real-world data and dense real- world data that AdaOja outperforms common learning rate choices for Oja’s method.

Fast Stochastic Algorithms for SVD and PCA: Convergence Properties and Convexity

  • O. Shamir
  • Computer Science, Mathematics
    ICML
  • 2016
The convergence properties of the VR-PCA algorithm introduced by Shamir 2015stochastic are studied, including a formal analysis of a block version of the algorithm, and convergence from random initialization is proved.

Convergence of Stochastic Gradient Descent for PCA

This paper provides the first eigengap-free convergence guarantees for SGD in the context of PCA in a streaming stochastic setting, and shows that the same techniques lead to new SGD convergence guarantees with better dependence on the eIGengap.

Tight query complexity lower bounds for PCA via finite sample deformed wigner law

A query complexity lower bound for approximating the top r dimensional eigenspace of a matrix and establishes a strict separation between convex optimization and “strict-saddle” non-convex optimization of which PCA is a canonical example is established.

The Fast Convergence of Incremental PCA

The top eigenvector of A is computed in an incremental fashion - with an algorithm that maintains an estimate of the top Eigenvector in O(d) space, and incrementally adjusts the estimate with each new data point that arrives.

The Noisy Power Method: A Meta Algorithm with Applications

A new robust convergence analysis of the well-known power method for computing the dominant singular vectors of a matrix that is called the noisy power method is provided and shows that the error dependence of the algorithm on the matrix dimension can be replaced by an essentially tight dependence on the coherence of the matrix.

Global Convergence of Stochastic Gradient Descent for Some Non-convex Matrix Problems

This paper exhibits a step size scheme for SGD on a low-rank least-squares problem, and proves that, under broad sampling conditions, the method converges globally from a random starting point within $O(\epsilon^{-1} n \log n)$ steps with constant probability for constant-rank problems.

Memory Limited, Streaming PCA

An algorithm is presented that uses O(kp) memory and is able to compute the k-dimensional spike with O(p log p) sample-complexity - the first algorithm of its kind.