Proximal Policy Optimization Algorithms
- J. Schulman, F. Wolski, Prafulla Dhariwal, Alec Radford, Oleg Klimov
- Computer ScienceArXiv
- 20 July 2017
We propose a new family of policy gradient methods for reinforcement learning, which alternate between sampling data through interaction with the environment, and optimizing a "surrogate" objective…
Dota 2 with Large Scale Deep Reinforcement Learning
- Christopher Berner, Greg Brockman, Susan Zhang
- Computer ScienceArXiv
- 13 December 2019
By defeating the Dota 2 world champion (Team OG), OpenAI Five demonstrates that self-play reinforcement learning can achieve superhuman performance on a difficult task.
Evolved Policy Gradients
- Rein Houthooft, Richard Y. Chen, P. Abbeel
- Computer ScienceNeural Information Processing Systems
- 12 February 2018
Empirical results show that the evolved policy gradient algorithm (EPG) achieves faster learning on several randomized environments compared to an off-the-shelf policy gradient method, and its learned loss can generalize to out-of-distribution test time tasks, and exhibits qualitatively different behavior from other popular metalearning algorithms.
Long-Term Planning and Situational Awareness in OpenAI Five
- Jonathan Raiman, Susan Zhang, F. Wolski
- Computer ScienceArXiv
- 13 December 2019
It is shown that the agent can learn situational similarity across actions, and find evidence of planning towards accomplishing subgoals minutes before they are executed, and a qualitative analysis of these predictions during the games against the DotA 2 world champions OG in April 2019 is performed.