Bayesian actor-critic algorithms

  title={Bayesian actor-critic algorithms},
  author={Mohammad Ghavamzadeh and Yaakov Engel},
We present a new actor-critic learning model in which a Bayesian class of non-parametric critics, using Gaussian process temporal difference learning is used. Such critics model the state-action value function as a Gaussian process, allowing Bayes' rule to be used in computing the posterior distribution over state-action value functions, conditioned on the observed data. Appropriate choices of the prior covariance (kernel) between state-action values and of the parametrization of the policy… CONTINUE READING
Highly Cited
This paper has 48 citations. REVIEW CITATIONS