Infinite-Horizon Policy-Gradient Estimation with Variable Discount Factor for Markov Decision Process

Abstract

A novel infinite-horizon policy-gradient estimation method with variable discount factor is proposed in this paper. This method tackles the normal policy-gradient estimation methods' limitations on unbalance of the bias and variance by using an incremental sequence as the discount factor. Numerical experiments conducted on the Markov decision process have… (More)

Topics

3 Figures and Tables

Cite this paper

@article{Bao2008InfiniteHorizonPE, title={Infinite-Horizon Policy-Gradient Estimation with Variable Discount Factor for Markov Decision Process}, author={Bing-Kun Bao and Bao-qun Yin and Hong-Sheng Xi}, journal={2008 3rd International Conference on Innovative Computing Information and Control}, year={2008}, pages={584-584} }