Bayesian Policy Gradient and Actor-Critic Algorithms

@article{Ghavamzadeh2016BayesianPG,
  title={Bayesian Policy Gradient and Actor-Critic Algorithms},
  author={Mohammad Ghavamzadeh and Yaakov Engel and Michal Valko},
  journal={Journal of Machine Learning Research},
  year={2016},
  volume={17},
  pages={66:1-66:53}
}
Policy gradient methods are reinforcement learning algorithms that adapt a parameterized policy by following a performance gradient estimate. Many conventional policy gradient methods use Monte-Carlo techniques to estimate this gradient. The policy is improved by adjusting the parameters in the direction of the gradient estimate. Since Monte-Carlo methods tend to have high variance, a large number of samples is required to attain accurate estimates, resulting in slow convergence. In this paper… CONTINUE READING