• Publications
  • Influence
Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks
TLDR
We propose an algorithm for meta-learning that is model-agnostic, in the sense that it is compatible with any model trained with gradient descent and applicable to a variety of different learning problems, including classification, regression, and reinforcement learning. Expand
  • 3,210
  • 934
  • PDF
Trust Region Policy Optimization
TLDR
In this article, we describe a method for optimizing control policies, with guaranteed monotonic improvement, with little tuning of hyperparameters. Expand
  • 2,928
  • 639
  • PDF
Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor
TLDR
An off-policy actor-critic deep RL algorithm based on the maximum entropy reinforcement learning framework achieves state-of-the-art performance on a range of continuous control benchmark tasks. Expand
  • 1,398
  • 375
  • PDF
High-Dimensional Continuous Control Using Generalized Advantage Estimation
TLDR
We use value functions to substantially reduce the variance of policy gradient estimates at the cost of some bias, with an exponentially-weighted estimator of the advantage function. Expand
  • 1,184
  • 274
  • PDF
Soft Actor-Critic Algorithms and Applications
TLDR
We introduce Soft Actor-Critic, an off-policy actor-critic algorithm based on the maximum entropy RL framework that achieves state-of-the-art performance in sample-efficiency and asymptotic performance. Expand
  • 356
  • 100
  • PDF
Reinforcement Learning with Deep Energy-Based Policies
TLDR
We propose a method for learning expressive energy-based policies for continuous states and actions, which has been feasible only in tabular domains before. Expand
  • 497
  • 90
  • PDF
Recurrent Network Models for Human Dynamics
TLDR
We propose the Encoder-Recurrent-Decoder (ERD) model for recognition and prediction of human body pose in videos and motion capture. Expand
  • 431
  • 88
  • PDF
Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models
TLDR
We propose a new algorithm called probabilistic ensembles with trajectory sampling (PETS) that combines uncertainty-aware deep network dynamics models with sampling-based uncertainty propagation. Expand
  • 378
  • 84
  • PDF
Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection
TLDR
We describe a learning-based approach to hand-eye coordination for robotic grasping from monocular images, which we demonstrate on a robotic grasping task. Expand
  • 1,128
  • 76
  • PDF
End-to-End Training of Deep Visuomotor Policies
TLDR
We develop a method that can be used to learn policies that map raw image observations directly to torques at the robot's motors. Expand
  • 2,080
  • 73
  • PDF
...
1
2
3
4
5
...