• Publications
  • Influence
Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks
TLDR
We propose an algorithm for meta-learning that is model-agnostic, in the sense that it is compatible with any model trained with gradient descent and applicable to a variety of different learning problems, including classification, regression, and reinforcement learning. Expand
  • 2,486
  • 731
  • PDF
Trust Region Policy Optimization
TLDR
In this article, we describe a method for optimizing control policies, with guaranteed monotonic improvement, with little tuning of hyperparameters. Expand
  • 2,559
  • 585
  • PDF
Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor
TLDR
An off-policy actor-critic deep RL algorithm based on the maximum entropy reinforcement learning framework achieves state-of-the-art performance on a range of continuous control benchmark tasks. Expand
  • 1,005
  • 301
  • PDF
High-Dimensional Continuous Control Using Generalized Advantage Estimation
TLDR
We use value functions to substantially reduce the variance of policy gradient estimates at the cost of some bias, with an exponentially-weighted estimator of the advantage function. Expand
  • 1,015
  • 240
  • PDF
Reinforcement Learning with Deep Energy-Based Policies
TLDR
We propose a method for learning expressive energy-based policies for continuous states and actions, which has been feasible only in tabular domains before. Expand
  • 427
  • 82
  • PDF
Recurrent Network Models for Human Dynamics
TLDR
We propose the Encoder-Recurrent-Decoder (ERD) model for recognition and prediction of human body pose in videos and motion capture. Expand
  • 379
  • 77
  • PDF
Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models
TLDR
We propose a new algorithm called probabilistic ensembles with trajectory sampling (PETS) that combines uncertainty-aware deep network dynamics models with sampling-based uncertainty propagation. Expand
  • 293
  • 75
  • PDF
Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection
TLDR
We describe a learning-based approach to hand-eye coordination for robotic grasping from monocular images, which we demonstrate on a robotic grasping task. Expand
  • 1,013
  • 73
  • PDF
Soft Actor-Critic Algorithms and Applications
TLDR
We introduce Soft Actor-Critic, an off-policy actor-critic algorithm based on the maximum entropy RL framework that achieves state-of-the-art performance in sample-efficiency and asymptotic performance. Expand
  • 261
  • 70
  • PDF
Continuous Deep Q-Learning with Model-based Acceleration
TLDR
In this paper, we explore algorithms and representations to reduce the sample complexity of deep reinforcement learning for continuous control tasks, and demonstrate substantially faster learning on domains where such models are applicable. Expand
  • 577
  • 65
  • PDF