# Scaling Limit: Exact and Tractable Analysis of Online Learning Algorithms with Applications to Regularized Regression and PCA

@article{Wang2017ScalingLE, title={Scaling Limit: Exact and Tractable Analysis of Online Learning Algorithms with Applications to Regularized Regression and PCA}, author={Chuang Wang and Jonathan C. Mattingly and Yue M. Lu}, journal={ArXiv}, year={2017}, volume={abs/1712.04332} }

We present a framework for analyzing the exact dynamics of a class of online learning algorithms in the high-dimensional scaling limit. Our results are applied to two concrete examples: online regularized linear regression and principal component analysis. As the ambient dimension tends to infinity, and with proper time scaling, we show that the time-varying joint empirical measures of the target feature vector and its estimates provided by the algorithms will converge weakly to a deterministic…

## 19 Citations

The Scaling Limit of High-Dimensional Online Independent Component Analysis

- Computer ScienceNIPS
- 2017

In the high-dimensional limit, the original coupled dynamics associated with the algorithm will be asymptotically "decoupled", with each coordinate independently solving a 1-D effective minimization problem via stochastic gradient descent.

Online Power Iteration For Subspace Estimation Under Incomplete Observations: Limiting Dynamics And Phase Transitions

- Computer Science2018 IEEE Statistical Signal Processing Workshop (SSP)
- 2018

This work shows that the dynamic performance of the imputation-based online power iteration method can be fully characterized by a finite-dimensional deterministic matrix recursion process, which provides an exact characterization of the relationship between estimation accuracy, sample complexity, and subsampling ratios.

A classification for the performance of online SGD for high-dimensional inference

- Computer ScienceArXiv
- 2020

This work investigates the performance of the simplest version of SGD at attaining a "better than random" correlation with the unknown parameter, i.e, achieving weak recovery, and classification of the difficulty of typical instances of this task for online SGD in terms of the number of samples required as the dimension diverges.

Subspace Estimation From Incomplete Observations: A High-Dimensional Analysis

- Computer ScienceIEEE Journal of Selected Topics in Signal Processing
- 2018

We present a high-dimensional analysis of three popular algorithms, namely, Oja's method, GROUSE, and PETRELS, for subspace estimation from streaming and highly incomplete observations. We show that,…

A Solvable High-Dimensional Model of GAN

- Computer ScienceNeurIPS
- 2019

It is proved that the macroscopic quantities measuring the quality of the training process converge to a deterministic process characterized by an ordinary differential equation (ODE), whereas the microscopic states containing all the detailed weights remain stochastic.

A Mean-Field Theory for Learning the Schönberg Measure of Radial Basis Functions

- Computer ScienceArXiv
- 2020

A projected particle Langevin optimization method to learn the distribution in the Schonberg integral representation of the radial basis functions from training samples is developed and analyzed, and the existence and uniqueness of the steady-state solutions of the derived PDE in the weak sense are established.

Streaming PCA and Subspace Tracking: The Missing Data Case

- Computer ScienceProceedings of the IEEE
- 2018

It is illustrated that streaming PCA and subspace tracking algorithms can be understood through algebraic and geometric perspectives, and they need to be adjusted carefully to handle missing data.

Mean Field Analysis of Deep Neural Networks

- Computer Science
- 2019

It is shown that, under suitable assumptions on the activation functions and the behavior for large times, the limit neural network recovers a global minimum (with zero loss for the objective function).

A Mean-Field Theory for Kernel Alignment with Random Features in Generative Adversarial Networks

- Computer ScienceArXiv
- 2019

A novel supervised learning method to optimize the kernel in maximum mean discrepancy generative adversarial networks (MMD GANs) with kernel learning attains higher inception scores well as Frechet inception distances and generates better images compared to the generative moment matching network (GMMN) and MMD GAN with untrained kernels.

Mean Field Analysis of Neural Networks: A Law of Large Numbers

- Computer ScienceSIAM J. Appl. Math.
- 2020

It is rigorously proved that the empirical distribution of the neural network parameters converges to the solution of a nonlinear partial differential equation, which can be considered a law of large numbers for neural networks.

## References

SHOWING 1-10 OF 68 REFERENCES

The Scaling Limit of High-Dimensional Online Independent Component Analysis

- Computer ScienceNIPS
- 2017

In the high-dimensional limit, the original coupled dynamics associated with the algorithm will be asymptotically "decoupled", with each coordinate independently solving a 1-D effective minimization problem via stochastic gradient descent.

Online learning for sparse PCA in high dimensions: Exact dynamics and phase transitions

- Computer Science2016 IEEE Information Theory Workshop (ITW)
- 2016

In the high-dimensional limit, the joint empirical measure of the underlying sparse eigenvector and its estimate provided by the algorithm is shown to converge weakly to a deterministic, measure-valued process.

Sharp Time–Data Tradeoffs for Linear Inverse Problems

- Mathematics, Computer ScienceIEEE Transactions on Information Theory
- 2018

The results demonstrate that a linear convergence rate is attainable even though the least squares objective is not strongly convex in these settings, and present a unified convergence analysis of the gradient projection algorithm applied to such problems.

The Dynamics of Message Passing on Dense Graphs, with Applications to Compressed Sensing

- Computer ScienceIEEE Transactions on Information Theory
- 2010

This paper proves that indeed it holds asymptotically in the large system limit for sensing matrices with independent and identically distributed Gaussian entries, and provides rigorous foundation to state evolution.

Asymptotic Analysis of MAP Estimation via the Replica Method and Applications to Compressed Sensing

- Computer ScienceIEEE Transactions on Information Theory
- 2012

It is shown that with random linear measurements and Gaussian noise, the replica-symmetric prediction of the asymptotic behavior of the postulated MAP estimate of an -dimensional vector “decouples” as scalar postulatedMAP estimators is shown to be correct.

A Direct Formulation for Sparse PCA Using Semidefinite Programming

- Computer ScienceSIAM Rev.
- 2007

A modification of the classical variational representation of the largest eigenvalue of a symmetric matrix is used, where cardinality is constrained, and a semidefinite programming-based relaxation is derived for the sparse PCA problem.

Vector approximate message passing

- Computer Science2017 IEEE International Symposium on Information Theory (ISIT)
- 2017

This paper considers a “vector AMP” (VAMP) algorithm and shows that VAMP has a rigorous scalar state-evolution that holds under a much broader class of large random matrices A: those that are right-rotationally invariant.

Robust Stochastic Approximation Approach to Stochastic Programming

- Computer Science, MathematicsSIAM J. Optim.
- 2009

It is intended to demonstrate that a properly modified SA approach can be competitive and even significantly outperform the SAA method for a certain class of convex stochastic problems.

Phase transitions in semidefinite relaxations

- Computer ScienceProceedings of the National Academy of Sciences
- 2016

Asymptotic predictions for several detection thresholds are developed, as well as for the estimation error above these thresholds, to clarify the effectiveness of SDP relaxations in solving high-dimensional statistical problems.

Making Gradient Descent Optimal for Strongly Convex Stochastic Optimization

- Computer ScienceICML
- 2012

This paper investigates the optimality of SGD in a stochastic setting, and shows that for smooth problems, the algorithm attains the optimal O(1/T) rate, however, for non-smooth problems the convergence rate with averaging might really be Ω(log(T)/T), and this is not just an artifact of the analysis.