• Corpus ID: 245537333

Nonconvex Stochastic Scaled-Gradient Descent and Generalized Eigenvector Problems

@article{Li2021NonconvexSS,
  title={Nonconvex Stochastic Scaled-Gradient Descent and Generalized Eigenvector Problems},
  author={Chris Junchi Li and Michael I. Jordan},
  journal={ArXiv},
  year={2021},
  volume={abs/2112.14738}
}
Motivated by the problem of online canonical correlation analysis, we propose the Stochastic Scaled-Gradient Descent (SSGD) algorithm for minimizing the expectation of a stochastic function over a generic Riemannian manifold. SSGD generalizes the idea of projected stochastic gradient descent and allows the use of scaled stochastic gradients instead of stochastic gradients. In the special case of a spherical constraint, which arises in generalized eigenvector problems, we establish a… 

Figures from this paper

A Generalized EigenGame with Extensions to Multiview Representation Learning

A game-theory inspired approach to GEPs using Hebbian and game theoretic approaches for the linear case but the method permits extension to general function approximators like neural networks for certain G EPs for dimensionality reduction including CCA which will be for deep multiview learning.

References

SHOWING 1-10 OF 49 REFERENCES

Convergence of Stochastic Gradient Descent for PCA

This paper provides the first eigengap-free convergence guarantees for SGD in the context of PCA in a streaming stochastic setting, and shows that the same techniques lead to new SGD convergence guarantees with better dependence on the eIGengap.

On Constrained Nonconvex Stochastic Optimization: A Case Study for Generalized Eigenvalue Decomposition

A simple, efficient, and stochastic primal-dual algorithm solving the online GEV problem is proposed and an asymptotic rate of convergence is established and the first sample complexity result is obtained by diffusion approximations, which are widely used in applied probability.

On Nonconvex Optimization for Machine Learning: Gradients, Stochasticity, and Saddle Points

Perturbed versions of GD and SGD are analyzed and it is shown that they are truly efficient---their dimension dependence is only polylogarithmic.

Scalable Kernel Methods via Doubly Stochastic Gradients

An approach that scales up kernel methods using a novel concept called "doubly stochastic functional gradients" based on the fact that many kernel methods can be expressed as convex optimization problems, which can readily scale kernel methods up to the regimes which are dominated by neural nets.

Recent Advances in Stochastic Riemannian Optimization

This chapter outlines numerous stochastic optimization algorithms on manifolds, ranging from the basic stoChastic gradient method to more advanced variance reduced stochastically methods, and presents a unified summary of convergence results.

First-order Methods for Geodesically Convex Optimization

This work is the first to provide global complexity analysis for first-order algorithms for general g-convex optimization, and proves upper bounds for the global complexity of deterministic and stochastic (sub)gradient methods for optimizing smooth and nonsmooth g- Convex functions, both with and without strong g-Convexity.

Stochastic Canonical Correlation Analysis

The sample complexity of canonical correlation analysis is studied to show that, given an estimate of the canonical correlation, the streaming version of the shift-and-invert power iterations achieves the same learning accuracy with the same level of sample complexity, by processing the data only once.

The landscape of empirical risk for nonconvex losses

It is demonstrated that in several problems such as non-convex binary classification, robust regression, and Gaussian mixture model, this result implies a complete characterization of the landscape of the empirical risk, and of the convergence properties of descent algorithms.

Adaptive Subgradient Methods for Online Learning and Stochastic Optimization

This work describes and analyze an apparatus for adaptively modifying the proximal function, which significantly simplifies setting a learning rate and results in regret guarantees that are provably as good as the best proximal functions that can be chosen in hindsight.

A Decomposition Algorithm for the Sparse Generalized Eigenvalue Problem

A new effective decomposition method that uses random or/and swapping strategies to find a working set and perform global combinatorial search over the small subset of variables to solve the sparse generalized eigenvalue problem.