Infinite-Horizon Policy-Gradient Estimation


Gradient-based approaches to direct policy search in reinforcement learning have received much recent attention as a means to solve problems of partial observability and to avoid some of the problems associated with policy degradation in value-function methods. In this paper we introduce GPOMDP, a simulation-based algorithm for generating a biased estimate… (More)
DOI: 10.1613/jair.806


2 Figures and Tables


Citations per Year

534 Citations

Semantic Scholar estimates that this publication has 534 citations based on the available data.

See our FAQ for additional information.