A Variance Controlled Stochastic Method with Biased Estimation for Faster Non-convex Optimization

  title={A Variance Controlled Stochastic Method with Biased Estimation for Faster Non-convex Optimization},
  author={Jia Bi and Steve R. Gunn},
. This paper proposes a new novelty optimization method Variance Controlled Stochastic Gradient (VCSG) to improve the performance of the stochastic variance reduced gradient (SVRG) algorithm. To avoid over-reducing the variance of gradient by SVRG, a hyper-parameter λ is introduced in VCSG that is able to control the reduced variance of SVRG. Theory shows that the optimization method can converge by using an unbiased gradient estimator, but in practice, biased gradient estimation can allow more… 

Minimizing Client Drift in Federated Learning via Adaptive Bias Estimation

This work proposes an adaptive algorithm that accurately estimates drift across clients in Federated Learning and induces stability by constraining the norm of estimates for client drift, making it more practical for large scale FL.

GhostShiftAddNet: More Features from Energy-Efficient Operations

The proposed GhostShiftAddNet can achieve higher classification accuracy with fewer FLOPs and parameters (reduced by up to 3×) than GhostNet and inference latency on the Jetson Nano is improved by 1.3× and 2× on the GPU and CPU respectively.



Stochastic Gradient Descent with Biased but Consistent Gradient Estimators

This work shows, in a general setting, that consistent gradient estimators result in the same convergence behavior as do unbiased ones, and opens several new research directions, including the development of more efficient SGD updates with consistent estimators and the design of efficient training algorithms for large-scale graphs.

Accelerating Stochastic Gradient Descent using Predictive Variance Reduction

It is proved that this method enjoys the same fast convergence rate as those of stochastic dual coordinate ascent (SDCA) and Stochastic Average Gradient (SAG), but the analysis is significantly simpler and more intuitive.

Stochastic Variance Reduction for Nonconvex Optimization

This work proves non-asymptotic rates of convergence of SVRG for nonconvex optimization, and shows that it is provably faster than SGD and gradient descent.

Stochastic Nested Variance Reduced Gradient Descent for Nonconvex Optimization

This work proposes a new stochastic gradient descent algorithm based on nested variance reduction that improves the best known gradient complexity of SVRG and the bestgradient complexity of SCSG.

Accelerated gradient methods for nonconvex nonlinear and stochastic programming

The AG method is generalized to solve nonconvex and possibly stochastic optimization problems and it is demonstrated that by properly specifying the stepsize policy, the AG method exhibits the best known rate of convergence for solving general non Convex smooth optimization problems by using first-order information, similarly to the gradient descent method.

Robust Stochastic Approximation Approach to Stochastic Programming

It is intended to demonstrate that a properly modified SA approach can be competitive and even significantly outperform the SAA method for a certain class of convex stochastic problems.

Lower Bounds for Non-Convex Stochastic Optimization

It is proved that (in the worst case) any algorithm requires at least $\epsilon^{-4}$ queries to find an stationary point, and establishes that stochastic gradient descent is minimax optimal in this model.

SPIDER: Near-Optimal Non-Convex Optimization via Stochastic Path Integrated Differential Estimator

This paper proposes a new technique named SPIDER, which can be used to track many deterministic quantities of interest with significantly reduced computational cost and proves that SPIDER-SFO nearly matches the algorithmic lower bound for finding approximate first-order stationary points under the gradient Lipschitz assumption in the finite-sum setting.

A New Class of Incremental Gradient Methods for Least Squares Problems

This work embeds both LMS and steepest descent, as well as other intermediate methods, within a one-parameter class of algorithms, and proposes a hybrid class of methods that combine the faster early convergence rate of LMS with the faster ultimate linear convergence rates of steepmost descent.

Fast incremental method for smooth nonconvex optimization

This paper analyzes the SAGA algorithm within an Incremental First-order Oracle framework, and shows that it converges to a stationary point provably faster than both gradient descent and stochastic gradient descent.