Nonconvex Stochastic Scaled-Gradient Descent and Generalized Eigenvector Problems
@article{Li2021NonconvexSS, title={Nonconvex Stochastic Scaled-Gradient Descent and Generalized Eigenvector Problems}, author={Chris Junchi Li and Michael I. Jordan}, journal={ArXiv}, year={2021}, volume={abs/2112.14738} }
Motivated by the problem of online canonical correlation analysis, we propose the Stochastic Scaled-Gradient Descent (SSGD) algorithm for minimizing the expectation of a stochastic function over a generic Riemannian manifold. SSGD generalizes the idea of projected stochastic gradient descent and allows the use of scaled stochastic gradients instead of stochastic gradients. In the special case of a spherical constraint, which arises in generalized eigenvector problems, we establish a…
One Citation
A Generalized EigenGame with Extensions to Multiview Representation Learning
- Computer ScienceArXiv
- 2022
A game-theory inspired approach to GEPs using Hebbian and game theoretic approaches for the linear case but the method permits extension to general function approximators like neural networks for certain G EPs for dimensionality reduction including CCA which will be for deep multiview learning.
References
SHOWING 1-10 OF 49 REFERENCES
Convergence of Stochastic Gradient Descent for PCA
- Computer ScienceICML
- 2016
This paper provides the first eigengap-free convergence guarantees for SGD in the context of PCA in a streaming stochastic setting, and shows that the same techniques lead to new SGD convergence guarantees with better dependence on the eIGengap.
On Constrained Nonconvex Stochastic Optimization: A Case Study for Generalized Eigenvalue Decomposition
- MathematicsAISTATS
- 2019
A simple, efficient, and stochastic primal-dual algorithm solving the online GEV problem is proposed and an asymptotic rate of convergence is established and the first sample complexity result is obtained by diffusion approximations, which are widely used in applied probability.
On Nonconvex Optimization for Machine Learning: Gradients, Stochasticity, and Saddle Points
- Computer Science
- 2019
Perturbed versions of GD and SGD are analyzed and it is shown that they are truly efficient---their dimension dependence is only polylogarithmic.
Scalable Kernel Methods via Doubly Stochastic Gradients
- Computer ScienceNIPS
- 2014
An approach that scales up kernel methods using a novel concept called "doubly stochastic functional gradients" based on the fact that many kernel methods can be expressed as convex optimization problems, which can readily scale kernel methods up to the regimes which are dominated by neural nets.
Recent Advances in Stochastic Riemannian Optimization
- Computer Science
- 2020
This chapter outlines numerous stochastic optimization algorithms on manifolds, ranging from the basic stoChastic gradient method to more advanced variance reduced stochastically methods, and presents a unified summary of convergence results.
First-order Methods for Geodesically Convex Optimization
- Computer Science, MathematicsCOLT
- 2016
This work is the first to provide global complexity analysis for first-order algorithms for general g-convex optimization, and proves upper bounds for the global complexity of deterministic and stochastic (sub)gradient methods for optimizing smooth and nonsmooth g- Convex functions, both with and without strong g-Convexity.
Stochastic Canonical Correlation Analysis
- Computer ScienceJ. Mach. Learn. Res.
- 2019
The sample complexity of canonical correlation analysis is studied to show that, given an estimate of the canonical correlation, the streaming version of the shift-and-invert power iterations achieves the same learning accuracy with the same level of sample complexity, by processing the data only once.
The landscape of empirical risk for nonconvex losses
- Computer Science, MathematicsThe Annals of Statistics
- 2018
It is demonstrated that in several problems such as non-convex binary classification, robust regression, and Gaussian mixture model, this result implies a complete characterization of the landscape of the empirical risk, and of the convergence properties of descent algorithms.
Adaptive Subgradient Methods for Online Learning and Stochastic Optimization
- Computer ScienceJ. Mach. Learn. Res.
- 2011
This work describes and analyze an apparatus for adaptively modifying the proximal function, which significantly simplifies setting a learning rate and results in regret guarantees that are provably as good as the best proximal functions that can be chosen in hindsight.
A Decomposition Algorithm for the Sparse Generalized Eigenvalue Problem
- Computer Science2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
- 2019
A new effective decomposition method that uses random or/and swapping strategies to find a working set and perform global combinatorial search over the small subset of variables to solve the sparse generalized eigenvalue problem.