Adaptive Momentum-Based Policy Gradient with Second-Order Information

  title={Adaptive Momentum-Based Policy Gradient with Second-Order Information},
  author={Saber Salehkaleybar and Sadegh Khorasani and Negar Kiyavash and Niao He and Patrick Thiran},
Variance-reduced gradient estimators for policy gradient meth- ods have been one of the main focus of research in the reinforcement learning in recent years as they allow acceleration of the estimation process. We propose a variance-reduced policy-gradient method, called SHARP, which incorporates second- order information into stochastic gradient descent (SGD) using momentum with a time-varying learning rate. SHARP algo- rithm is parameter-free, achieving (cid:15) -approximate first-order… 
1 Citations

Figures and Tables from this paper

Decentralized Natural Policy Gradient with Variance Reduction for Collaborative Multi-Agent Reinforcement Learning

A novel decentralized natural policy gradient method, dubbed Momentum-based Decentralized Natural Policy Gradient (MDNPG), is proposed, which incorporates natural gradient, momentum-based variance reduction, and gradient tracking into the decentralized stochastic gradient ascent framework.



Simple statistical gradient-following algorithms for connectionist reinforcement learning

This article presents a general class of associative reinforcement learning algorithms for connectionist networks containing stochastic units that are shown to make weight adjustments in a direction that lies along the gradient of expected reinforcement in both immediate-reinforcement tasks and certain limited forms of delayed-reInforcement tasks, and they do this without explicitly computing gradient estimates.

PAGE-PG: A Simple and Loopless Variance-Reduced Policy Gradient Method with Probabilistic Gradient Estimation

ProbAbilistic Gradient Estimation for Policy Gradient (PAGE-PG) is proposed, a novel loopless variance-reduced policy gradient method based on a probabilistic switch between two types of update based on an unbiased gradient estimator inspired by the PAGE estimator for supervised learning.

Better SGD using Second-order Momentum

A new algorithm is developed that finds an -critical point in the optimal O( −3) stochastic gradient and Hessian-vector product computations that leads to better gradient estimates in a manner analogous to variance reduction methods.

Momentum-Based Policy Gradient Methods

A class of efficient momentum-based policy gradient methods for the model-free reinforcement learning, which use adaptive learning rates and do not require any large batches and reach the best known sample complexity of $O(\epsilon^{-3})$ without any large batch.

Second-Order Information in Non-Convex Stochastic Optimization: Power and Limitations

An algorithm which finds an $\epsilon$-approximate stationary point using stochastic gradient and Hessian-vector products is designed, and a lower bound is proved which establishes that this rate is optimal and that it cannot be improved using Stochastic $p$th order methods for any $p\ge 2$ even when the first $ p$ derivatives of the objective are Lipschitz.

Policy Optimization with Stochastic Mirror Descent

It is proved that the proposed VRMPO needs only O(ε−3) sample trajectories to achieve an ε-approximate first-order stationary point, which matches the best sample complexity for policy optimization.

An Improved Convergence Analysis of Stochastic Variance-Reduced Policy Gradient

An improved convergence analysis of SVRPG is provided and it is shown that it can find an $\epsilon$-approximate stationary point of the performance function within $O(1/\ep silon^{5/3})$ trajectories, and sample complexity improves upon the best known result.

Momentum-Based Variance Reduction in Non-Convex SGD

A new algorithm is presented, STORM, that does not require any batches and makes use of adaptive learning rates, enabling simpler implementation and less hyperparameter tuning.

Hessian Aided Policy Gradient

This paper presents a Hessian aided policy gradient method with the first improved sample complexity of O(1/ ), which can be implemented in linear time with respect to the parameter dimension and is hence applicable to sophisticated DNN parameterization.

Policy Gradient Methods for Reinforcement Learning with Function Approximation

This paper proves for the first time that a version of policy iteration with arbitrary differentiable function approximation is convergent to a locally optimal policy.