A Variance Controlled Stochastic Method with Biased Estimation for Faster Non-convex Optimization
@article{Bi2021AVC, title={A Variance Controlled Stochastic Method with Biased Estimation for Faster Non-convex Optimization}, author={Jia Bi and Steve R. Gunn}, journal={ArXiv}, year={2021}, volume={abs/2102.09893} }
. This paper proposes a new novelty optimization method Variance Controlled Stochastic Gradient (VCSG) to improve the performance of the stochastic variance reduced gradient (SVRG) algorithm. To avoid over-reducing the variance of gradient by SVRG, a hyper-parameter λ is introduced in VCSG that is able to control the reduced variance of SVRG. Theory shows that the optimization method can converge by using an unbiased gradient estimator, but in practice, biased gradient estimation can allow more…
2 Citations
Minimizing Client Drift in Federated Learning via Adaptive Bias Estimation
- Computer ScienceECCV
- 2022
This work proposes an adaptive algorithm that accurately estimates drift across clients in Federated Learning and induces stability by constraining the norm of estimates for client drift, making it more practical for large scale FL.
GhostShiftAddNet: More Features from Energy-Efficient Operations
- Computer ScienceBMVC
- 2021
The proposed GhostShiftAddNet can achieve higher classification accuracy with fewer FLOPs and parameters (reduced by up to 3×) than GhostNet and inference latency on the Jetson Nano is improved by 1.3× and 2× on the GPU and CPU respectively.
References
SHOWING 1-10 OF 43 REFERENCES
Stochastic Gradient Descent with Biased but Consistent Gradient Estimators
- Computer ScienceArXiv
- 2018
This work shows, in a general setting, that consistent gradient estimators result in the same convergence behavior as do unbiased ones, and opens several new research directions, including the development of more efficient SGD updates with consistent estimators and the design of efficient training algorithms for large-scale graphs.
Accelerating Stochastic Gradient Descent using Predictive Variance Reduction
- Computer ScienceNIPS
- 2013
It is proved that this method enjoys the same fast convergence rate as those of stochastic dual coordinate ascent (SDCA) and Stochastic Average Gradient (SAG), but the analysis is significantly simpler and more intuitive.
Stochastic Variance Reduction for Nonconvex Optimization
- Computer ScienceICML
- 2016
This work proves non-asymptotic rates of convergence of SVRG for nonconvex optimization, and shows that it is provably faster than SGD and gradient descent.
Stochastic Nested Variance Reduced Gradient Descent for Nonconvex Optimization
- Computer ScienceNeurIPS
- 2018
This work proposes a new stochastic gradient descent algorithm based on nested variance reduction that improves the best known gradient complexity of SVRG and the bestgradient complexity of SCSG.
Accelerated gradient methods for nonconvex nonlinear and stochastic programming
- Computer ScienceMath. Program.
- 2016
The AG method is generalized to solve nonconvex and possibly stochastic optimization problems and it is demonstrated that by properly specifying the stepsize policy, the AG method exhibits the best known rate of convergence for solving general non Convex smooth optimization problems by using first-order information, similarly to the gradient descent method.
Robust Stochastic Approximation Approach to Stochastic Programming
- Computer Science, MathematicsSIAM J. Optim.
- 2009
It is intended to demonstrate that a properly modified SA approach can be competitive and even significantly outperform the SAA method for a certain class of convex stochastic problems.
Lower Bounds for Non-Convex Stochastic Optimization
- Computer Science, MathematicsArXiv
- 2019
It is proved that (in the worst case) any algorithm requires at least $\epsilon^{-4}$ queries to find an stationary point, and establishes that stochastic gradient descent is minimax optimal in this model.
SPIDER: Near-Optimal Non-Convex Optimization via Stochastic Path Integrated Differential Estimator
- Computer Science, MathematicsNeurIPS
- 2018
This paper proposes a new technique named SPIDER, which can be used to track many deterministic quantities of interest with significantly reduced computational cost and proves that SPIDER-SFO nearly matches the algorithmic lower bound for finding approximate first-order stationary points under the gradient Lipschitz assumption in the finite-sum setting.
A New Class of Incremental Gradient Methods for Least Squares Problems
- Computer ScienceSIAM J. Optim.
- 1997
This work embeds both LMS and steepest descent, as well as other intermediate methods, within a one-parameter class of algorithms, and proposes a hybrid class of methods that combine the faster early convergence rate of LMS with the faster ultimate linear convergence rates of steepmost descent.
Fast incremental method for smooth nonconvex optimization
- Computer Science2016 IEEE 55th Conference on Decision and Control (CDC)
- 2016
This paper analyzes the SAGA algorithm within an Incremental First-order Oracle framework, and shows that it converges to a stationary point provably faster than both gradient descent and stochastic gradient descent.