• Publications
  • Influence
Proximal Policy Optimization Algorithms
TLDR
We propose a new family of policy gradient methods for reinforcement learning, which alternate between sampling data through interaction with the environment, and optimizing a "surrogate" objective function using stochastic gradient ascent. Expand
  • 3,673
  • 1082
  • PDF
Exploration by Random Network Distillation
TLDR
We introduce an exploration bonus for deep reinforcement learning methods that is easy to implement and adds minimal overhead to the computation performed. Expand
  • 326
  • 99
  • PDF
Quantifying Generalization in Reinforcement Learning
TLDR
In this paper, we investigate the problem of overfitting in deep reinforcement learning. Expand
  • 211
  • 41
  • PDF
Gotta Learn Fast: A New Benchmark for Generalization in RL
TLDR
In this report, we present a new reinforcement learning benchmark based on the Sonic the Hedgehog (TM) video game franchise. Expand
  • 89
  • 13
  • PDF
Phasic Policy Gradient
TLDR
We introduce Phasic Policy Gradient (PPG), a reinforcement learning framework which modifies traditional on-policy actor-critic methods by separating policy and value function training into distinct phases, one that advances training and one that distills features. Expand
  • 5
  • 1
  • PDF