Variance Reduction for Policy Gradient with Action-Dependent Factorized Baselines

  title={Variance Reduction for Policy Gradient with Action-Dependent Factorized Baselines},
  author={Cathy Wu and Aravind Rajeswaran and Yan Duan and Vikash Kumar and Alexandre M. Bayen and Sham M. Kakade and Igor Mordatch and Pieter Abbeel},
Policy gradient methods have enjoyed great success in deep reinforcement learning but suffer from high variance of gradient estimates. The high variance problem is particularly exasperated in problems with long horizons or high-dimensional action spaces. To mitigate this issue, we derive a bias-free action-dependent baseline for variance reduction which fully exploits the structural form of the stochastic policy itself and does not make any additional assumptions about the MDP. We demonstrate… CONTINUE READING



Citations per Year

Citation Velocity: 6

Averaging 6 citations per year over the last 2 years.

Learn more about how we calculate this metric in our FAQ.