• Publications
  • Influence
Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks
We propose an algorithm for meta-learning that is model-agnostic, in the sense that it is compatible with any model trained with gradient descent and applicable to a variety of different learningExpand
Trust Region Policy Optimization
A method for optimizing control policies, with guaranteed monotonic improvement, by making several approximations to the theoretically-justified scheme, called Trust Region Policy Optimization (TRPO). Expand
Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor
This paper proposes soft actor-critic, an off-policy actor-Critic deep RL algorithm based on the maximum entropy reinforcement learning framework, and achieves state-of-the-art performance on a range of continuous control benchmark tasks, outperforming prior on-policy and off- policy methods. Expand
InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets
Experiments show that InfoGAN learns interpretable representations that are competitive with representations learned by existing fully supervised methods. Expand
High-Dimensional Continuous Control Using Generalized Advantage Estimation
This work addresses the large number of samples typically required and the difficulty of obtaining stable and steady improvement despite the nonstationarity of the incoming data by using value functions to substantially reduce the variance of policy gradient estimates at the cost of some bias. Expand
Apprenticeship learning via inverse reinforcement learning
This work thinks of the expert as trying to maximize a reward function that is expressible as a linear combination of known features, and gives an algorithm for learning the task demonstrated by the expert, based on using "inverse reinforcement learning" to try to recover the unknown reward function. Expand
Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments
An adaptation of actor-critic methods that considers action policies of other agents and is able to successfully learn policies that require complex multi-agent coordination is presented. Expand
Hindsight Experience Replay
A novel technique is presented which allows sample-efficient learning from rewards which are sparse and binary and therefore avoid the need for complicated reward engineering and may be seen as a form of implicit curriculum. Expand
Soft Actor-Critic Algorithms and Applications
Soft Actor-Critic (SAC), the recently introduced off-policy actor-critic algorithm based on the maximum entropy RL framework, achieves state-of-the-art performance, outperforming prior on-policy and off- policy methods in sample-efficiency and asymptotic performance. Expand
Benchmarking Deep Reinforcement Learning for Continuous Control
This work presents a benchmark suite of continuous control tasks, including classic tasks like cart-pole swing-up, tasks with very high state and action dimensionality such as 3D humanoid locomotion, task with partial observations, and tasks with hierarchical structure. Expand