• Corpus ID: 239024759

On the Global Convergence of Momentum-based Policy Gradient

@article{Ding2021OnTG,
  title={On the Global Convergence of Momentum-based Policy Gradient},
  author={Yuhao Ding and Junzi Zhang and Javad Lavaei},
  journal={ArXiv},
  year={2021},
  volume={abs/2110.10116}
}
Policy gradient (PG) methods are popular and efficient for large-scale reinforcement learning due to their relative stability and incremental nature. In recent years, the empirical success of PG methods has led to the development of a theoretical foundation for these methods. In this work, we generalize this line of research by studying the global convergence of stochastic PG methods with momentum terms, which have been demonstrated to be efficient recipes for improving PG methods. We study… 

Tables from this paper

References

SHOWING 1-10 OF 67 REFERENCES
On the Convergence and Sample Efficiency of Variance-Reduced Policy Gradient Method
TLDR
A Stochastic Incremental Variance-Reduced Policy Gradient (SIVR-PG) approach that improves a sequence of policies to provably converge to the global optimal solution and finds an -optimal policy using Õ( −2) samples.
An Improved Analysis of (Variance-Reduced) Policy Gradient and Natural Policy Gradient Methods
TLDR
This paper revisits and improves the convergence of policy gradient, natural PG (NPG) methods, and their variance-reduced variants, under general smooth policy parametrizations, and proposes SRVR-NPG, which incorporates variancereduction into the NPG update.
A general sample complexity analysis of vanilla policy gradient
TLDR
This paper applies recent tools developed for the analysis of SGD in non-convex optimization to obtain convergence guarantees for both REINFORCE and GPOMDP under smoothness assumption on the objective function and weak conditions on the second moment of the norm of the estimated gradient.
Momentum-Based Policy Gradient Methods
TLDR
A class of efficient momentum-based policy gradient methods for the model-free reinforcement learning, which use adaptive learning rates and do not require any large batches and reach the best known sample complexity of $O(\epsilon^{-3})$ without any large batch.
Global Convergence of Policy Gradient Methods for the Linear Quadratic Regulator
TLDR
This work bridges the gap showing that (model free) policy gradient methods globally converge to the optimal solution and are efficient (polynomially so in relevant problem dependent quantities) with regards to their sample and computational complexities.
Infinite-Horizon Policy-Gradient Estimation
TLDR
GPOMDP, a simulation-based algorithm for generating a biased estimate of the gradient of the average reward in Partially Observable Markov Decision Processes (POMDPs) controlled by parameterized stochastic policies, is introduced.
Variational Policy Gradient Method for Reinforcement Learning with General Utilities
TLDR
A new Variational Policy Gradient Theorem for RL with general utilities is derived, which establishes that the parametrized policy gradient may be obtained as the solution of a stochastic saddle point problem involving the Fenchel dual of the utility function.
Hessian Aided Policy Gradient
TLDR
This paper presents a Hessian aided policy gradient method with the first improved sample complexity of O(1/ ), which can be implemented in linear time with respect to the parameter dimension and is hence applicable to sophisticated DNN parameterization.
Variance Reduction for Policy Gradient with Action-Dependent Factorized Baselines
TLDR
The experimental results indicate that action-dependent baselines allow for faster learning on standard reinforcement learning benchmarks and high-dimensional hand manipulation and synthetic tasks, and the general idea of including additional information in baselines for improved variance reduction can be extended to partially observed and multi-agent tasks.
Global Optimality Guarantees For Policy Gradient Methods
TLDR
This work identifies structural properties -- shared by finite MDPs and several classic control problems -- which guarantee that policy gradient objective function has no suboptimal local minima despite being non-convex.
...
1
2
3
4
5
...