author={Saber Salehkaleybar and Sadegh Khorasani and Negar Kiyavash and Niao He and Patrick Thiran},
journal={ArXiv},
year={2022},
volume={abs/2205.08253}
}
• Published 17 May 2022
• Computer Science
• ArXiv
Variance-reduced gradient estimators for policy gradient meth- ods have been one of the main focus of research in the reinforcement learning in recent years as they allow acceleration of the estimation process. We propose a variance-reduced policy-gradient method, called SHARP, which incorporates second- order information into stochastic gradient descent (SGD) using momentum with a time-varying learning rate. SHARP algo- rithm is parameter-free, achieving (cid:15) -approximate ﬁrst-order…
1 Citations

## Figures and Tables from this paper

### Decentralized Natural Policy Gradient with Variance Reduction for Collaborative Multi-Agent Reinforcement Learning

• Computer Science
• 2022
A novel decentralized natural policy gradient method, dubbed Momentum-based Decentralized Natural Policy Gradient (MDNPG), is proposed, which incorporates natural gradient, momentum-based variance reduction, and gradient tracking into the decentralized stochastic gradient ascent framework.

## References

SHOWING 1-10 OF 41 REFERENCES

### Simple statistical gradient-following algorithms for connectionist reinforcement learning

This article presents a general class of associative reinforcement learning algorithms for connectionist networks containing stochastic units that are shown to make weight adjustments in a direction that lies along the gradient of expected reinforcement in both immediate-reinforcement tasks and certain limited forms of delayed-reInforcement tasks, and they do this without explicitly computing gradient estimates.

### PAGE-PG: A Simple and Loopless Variance-Reduced Policy Gradient Method with Probabilistic Gradient Estimation

• Computer Science
ICML
• 2022
ProbAbilistic Gradient Estimation for Policy Gradient (PAGE-PG) is proposed, a novel loopless variance-reduced policy gradient method based on a probabilistic switch between two types of update based on an unbiased gradient estimator inspired by the PAGE estimator for supervised learning.

### Better SGD using Second-order Momentum

• Computer Science
• 2021
A new algorithm is developed that finds an -critical point in the optimal O( −3) stochastic gradient and Hessian-vector product computations that leads to better gradient estimates in a manner analogous to variance reduction methods.

• Computer Science
ICML
• 2020
A class of efficient momentum-based policy gradient methods for the model-free reinforcement learning, which use adaptive learning rates and do not require any large batches and reach the best known sample complexity of $O(\epsilon^{-3})$ without any large batch.

### Second-Order Information in Non-Convex Stochastic Optimization: Power and Limitations

• Computer Science
COLT
• 2020
An algorithm which finds an $\epsilon$-approximate stationary point using stochastic gradient and Hessian-vector products is designed, and a lower bound is proved which establishes that this rate is optimal and that it cannot be improved using Stochastic $p$th order methods for any $p\ge 2$ even when the first $p$ derivatives of the objective are Lipschitz.

### Policy Optimization with Stochastic Mirror Descent

• Computer Science
AAAI
• 2022
It is proved that the proposed VRMPO needs only O(ε−3) sample trajectories to achieve an ε-approximate first-order stationary point, which matches the best sample complexity for policy optimization.

### An Improved Convergence Analysis of Stochastic Variance-Reduced Policy Gradient

• Computer Science
UAI
• 2019
An improved convergence analysis of SVRPG is provided and it is shown that it can find an $\epsilon$-approximate stationary point of the performance function within $O(1/\ep silon^{5/3})$ trajectories, and sample complexity improves upon the best known result.

### Momentum-Based Variance Reduction in Non-Convex SGD

• Computer Science
NeurIPS
• 2019
A new algorithm is presented, STORM, that does not require any batches and makes use of adaptive learning rates, enabling simpler implementation and less hyperparameter tuning.