Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor
- Tuomas Haarnoja, Aurick Zhou, P. Abbeel, S. Levine
- Computer ScienceInternational Conference on Machine Learning
- 4 January 2018
This paper proposes soft actor-critic, an off-policy actor-Critic deep RL algorithm based on the maximum entropy reinforcement learning framework, and achieves state-of-the-art performance on a range of continuous control benchmark tasks, outperforming prior on-policy and off- policy methods.
Soft Actor-Critic Algorithms and Applications
- Tuomas Haarnoja, Aurick Zhou, S. Levine
- Computer ScienceArXiv
- 13 December 2018
Soft Actor-Critic (SAC), the recently introduced off-policy actor-critic algorithm based on the maximum entropy RL framework, achieves state-of-the-art performance, outperforming prior on-policy and off- policy methods in sample-efficiency and asymptotic performance.
Conservative Q-Learning for Offline Reinforcement Learning
- Aviral Kumar, Aurick Zhou, G. Tucker, S. Levine
- Computer ScienceNeural Information Processing Systems
- 8 June 2020
Conservative Q-learning (CQL) is proposed, which aims to address limitations of offline RL methods by learning a conservative Q-function such that the expected value of a policy under this Q- function lower-bounds its true value.
Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Variables
- Kate Rakelly, Aurick Zhou, Deirdre Quillen, Chelsea Finn, S. Levine
- Computer ScienceInternational Conference on Machine Learning
- 19 March 2019
This paper develops an off-policy meta-RL algorithm that disentangles task inference and control and performs online probabilistic filtering of latent task variables to infer how to solve a new task from small amounts of experience.
Composable Deep Reinforcement Learning for Robotic Manipulation
- Tuomas Haarnoja, Vitchyr H. Pong, Aurick Zhou, Murtaza Dalal, P. Abbeel, S. Levine
- Computer ScienceIEEE International Conference on Robotics and…
- 19 March 2018
This paper shows that policies learned with soft Q-learning can be composed to create new policies, and that the optimality of the resulting policy can be bounded in terms of the divergence between the composed policies.
Learning to Walk via Deep Reinforcement Learning
- Tuomas Haarnoja, Aurick Zhou, Sehoon Ha, Jie Tan, G. Tucker, S. Levine
- Computer ScienceRobotics: Science and Systems
- 1 December 2018
A sample-efficient deep RL algorithm based on maximum entropy RL that requires minimal per-task tuning and only a modest number of trials to learn neural network policies is proposed and achieves state-of-the-art performance on simulated benchmarks with a single set of hyperparameters.
MURAL: Meta-Learning Uncertainty-Aware Rewards for Outcome-Driven Reinforcement Learning
- Kevin Li, Abhishek Gupta, S. Levine
- Computer ScienceInternational Conference on Machine Learning
- 15 July 2021
This work shows that an uncertainty aware classifier can solve challenging reinforcement learning problems by both encouraging exploration and provided directed guidance towards positive outcomes, and proposes a novel mechanism for obtaining these calibrated, uncertainty-aware classifiers based on an amortized technique for computing the normalized maximum likelihood (NML) distribution.
Bayesian Adaptation for Covariate Shift
- Aurick Zhou, S. Levine
- Computer ScienceNeural Information Processing Systems
- 2021
Wayformer: Motion Forecasting via Simple & Efficient Attention Networks
- Nigamaa Nayakanti, Rami Al-Rfou, Aurick Zhou, Kratarth Goel, Khaled S. Refaat, Benjamin Sapp
- Computer ScienceArXiv
- 12 July 2022
Wayformer, a family of attention based architectures for motion forecasting that are simple and homogeneous, is presented and it is shown that early fusion, despite its simplicity of construction, is not only modality agnostic but also achieves state-of-the-art results on both Waymo Open Motion Dataset (WOMD) and Argoverse leaderboards.
Amortized Conditional Normalized Maximum Likelihood: Reliable Out of Distribution Uncertainty Estimation
- Aurick Zhou, S. Levine
- Computer ScienceInternational Conference on Machine Learning
- 5 November 2020
The amortized conditional normalized maximum likelihood (ACNML) method is proposed as a scalable general-purpose approach for uncertainty estimation, calibration, and out-of-distribution robustness with deep networks.
...
...