Author pages are created from data sourced from our academic publisher partnerships and public sources.
Share This Author
Proximal Policy Optimization Algorithms
- J. Schulman, F. Wolski, Prafulla Dhariwal, Alec Radford, Oleg Klimov
- Computer ScienceArXiv
- 20 July 2017
We propose a new family of policy gradient methods for reinforcement learning, which alternate between sampling data through interaction with the environment, and optimizing a "surrogate" objective…
Trust Region Policy Optimization
- J. Schulman, S. Levine, P. Abbeel, Michael I. Jordan, Philipp Moritz
- Computer ScienceICML
- 19 February 2015
A method for optimizing control policies, with guaranteed monotonic improvement, by making several approximations to the theoretically-justified scheme, called Trust Region Policy Optimization (TRPO).
InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets
- Xi Chen, Yan Duan, Rein Houthooft, J. Schulman, Ilya Sutskever, P. Abbeel
- Computer ScienceNIPS
- 12 June 2016
Experiments show that InfoGAN learns interpretable representations that are competitive with representations learned by existing fully supervised methods.
High-Dimensional Continuous Control Using Generalized Advantage Estimation
This work addresses the large number of samples typically required and the difficulty of obtaining stable and steady improvement despite the nonstationarity of the incoming data by using value functions to substantially reduce the variance of policy gradient estimates at the cost of some bias.
This whitepaper discusses the components of OpenAI Gym and the design decisions that went into the software.
On First-Order Meta-Learning Algorithms
A family of algorithms for learning a parameter initialization that can be fine-tuned quickly on a new task, using only first-order derivatives for the meta-learning updates, including Reptile, which works by repeatedly sampling a task, training on it, and moving the initialization towards the trained weights on that task.
Theano: A Python framework for fast computation of mathematical expressions
The performance of Theano is compared against Torch7 and TensorFlow on several machine learning models and recently-introduced functionalities and improvements are discussed.
Benchmarking Deep Reinforcement Learning for Continuous Control
This work presents a benchmark suite of continuous control tasks, including classic tasks like cart-pole swing-up, tasks with very high state and action dimensionality such as 3D humanoid locomotion, task with partial observations, and tasks with hierarchical structure.
RL$^2$: Fast Reinforcement Learning via Slow Reinforcement Learning
- Yan Duan, J. Schulman, Xi Chen, P. Bartlett, Ilya Sutskever, P. Abbeel
- Computer ScienceArXiv
- 4 November 2016
This paper proposes to represent a "fast" reinforcement learning algorithm as a recurrent neural network (RNN) and learn it from data, encoded in the weights of the RNN, which are learned slowly through a general-purpose ("slow") RL algorithm.
Variational Lossy Autoencoder
This paper presents a simple but principled method to learn global representations by combining Variational Autoencoder (VAE) with neural autoregressive models such as RNN, MADE and PixelRNN/CNN with greatly improve generative modeling performance of VAEs.