Bayesian Policy Gradient and Actor-Critic Algorithms

  title={Bayesian Policy Gradient and Actor-Critic Algorithms},
  author={Mohammad Ghavamzadeh and Yaakov Engel and Michal Valko},
  journal={Journal of Machine Learning Research},
Policy gradient methods are reinforcement learning algorithms that adapt a parameterized policy by following a performance gradient estimate. Many conventional policy gradient methods use Monte-Carlo techniques to estimate this gradient. The policy is improved by adjusting the parameters in the direction of the gradient estimate. Since Monte-Carlo methods tend to have high variance, a large number of samples is required to attain accurate estimates, resulting in slow convergence. In this paper… CONTINUE READING