Corpus ID: 221555310

Phasic Policy Gradient

@article{Cobbe2020PhasicPG,
  title={Phasic Policy Gradient},
  author={K. Cobbe and J. Hilton and O. Klimov and John Schulman},
  journal={ArXiv},
  year={2020},
  volume={abs/2009.04416}
}
  • K. Cobbe, J. Hilton, +1 author John Schulman
  • Published 2020
  • Computer Science, Mathematics
  • ArXiv
  • We introduce Phasic Policy Gradient (PPG), a reinforcement learning framework which modifies traditional on-policy actor-critic methods by separating policy and value function training into distinct phases. In prior methods, one must choose between using a shared network or separate networks to represent the policy and value function. Using separate networks avoids interference between objectives, while using a shared network allows useful features to be shared. PPG is able to achieve the best… CONTINUE READING
    1 Citations

    Figures from this paper.

    Prioritized Level Replay

    References

    SHOWING 1-10 OF 21 REFERENCES
    Proximal Policy Optimization Algorithms
    • 2,982
    • PDF
    V-MPO: On-Policy Maximum a Posteriori Policy Optimization for Discrete and Continuous Control
    • 22
    • Highly Influential
    • PDF
    The Impact of Non-stationarity on Generalisation in Deep Reinforcement Learning
    • 4
    • PDF
    What Matters In On-Policy Reinforcement Learning? A Large-Scale Empirical Study
    • 9
    • PDF
    Continuous control with deep reinforcement learning
    • 3,781
    • Highly Influential
    • PDF
    IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures
    • 480
    • PDF
    Trust Region Policy Optimization
    • 2,553
    • PDF
    Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor
    • 997
    • Highly Influential
    • PDF
    High-Dimensional Continuous Control Using Generalized Advantage Estimation
    • 1,014
    • PDF