Analysis and improvement of policy gradient estimation


Policy gradient is a useful model-free reinforcement learning approach, but it tends to suffer from instability of gradient estimates. In this paper, we analyze and improve the stability of policy gradient methods. We first prove that the variance of gradient estimates in the PGPE (policy gradients with parameter-based exploration) method is smaller than… (More)
DOI: 10.1016/j.neunet.2011.09.005


2 Figures and Tables


Citations per Year

Citation Velocity: 38

Averaging 38 citations per year over the last 3 years.

Learn more about how we calculate this metric in our FAQ.

Cite this paper

@article{Zhao2011AnalysisAI, title={Analysis and improvement of policy gradient estimation}, author={Tingting Zhao and Hirotaka Hachiya and Gang Niu and Masashi Sugiyama}, journal={Neural networks : the official journal of the International Neural Network Society}, year={2011}, volume={26}, pages={118-29} }