Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor
- Tuomas Haarnoja, Aurick Zhou, P. Abbeel, S. Levine
- Computer ScienceInternational Conference on Machine Learning
- 4 January 2018
This paper proposes soft actor-critic, an off-policy actor-Critic deep RL algorithm based on the maximum entropy reinforcement learning framework, and achieves state-of-the-art performance on a range of continuous control benchmark tasks, outperforming prior on-policy and off- policy methods.
Soft Actor-Critic Algorithms and Applications
- Tuomas Haarnoja, Aurick Zhou, S. Levine
- Computer ScienceArXiv
- 13 December 2018
Soft Actor-Critic (SAC), the recently introduced off-policy actor-critic algorithm based on the maximum entropy RL framework, achieves state-of-the-art performance, outperforming prior on-policy and off- policy methods in sample-efficiency and asymptotic performance.
Reinforcement Learning with Deep Energy-Based Policies
- Tuomas Haarnoja, Haoran Tang, P. Abbeel, S. Levine
- Computer ScienceInternational Conference on Machine Learning
- 27 February 2017
A method for learning expressive energy-based policies for continuous states and actions, which has been feasible only in tabular domains before, is proposed and a new algorithm, called soft Q-learning, that expresses the optimal policy via a Boltzmann distribution is applied.
Latent Space Policies for Hierarchical Reinforcement Learning
- Tuomas Haarnoja, Kristian Hartikainen, P. Abbeel, S. Levine
- Computer ScienceInternational Conference on Machine Learning
- 9 April 2018
This work addresses the problem of learning hierarchical deep neural network policies for reinforcement learning by constraining the mapping from latent variables to actions to be invertible, and shows that this method can solve more complex sparse-reward tasks by learning higher-level policies on top of high-entropy skills optimized for simple low-level objectives.
Composable Deep Reinforcement Learning for Robotic Manipulation
- Tuomas Haarnoja, Vitchyr H. Pong, Aurick Zhou, Murtaza Dalal, P. Abbeel, S. Levine
- Computer ScienceIEEE International Conference on Robotics and…
- 19 March 2018
This paper shows that policies learned with soft Q-learning can be composed to create new policies, and that the optimality of the resulting policy can be bounded in terms of the divergence between the composed policies.
Backprop KF: Learning Discriminative Deterministic State Estimators
- Tuomas Haarnoja, Anurag Ajay, S. Levine, P. Abbeel
- Computer ScienceNIPS
- 23 May 2016
This work presents an alternative approach where the parameters of the latent state distribution are directly optimized as a deterministic computation graph, resulting in a simple and effective gradient descent algorithm for training discriminative state estimators.
Learning to Walk via Deep Reinforcement Learning
- Tuomas Haarnoja, Aurick Zhou, Sehoon Ha, Jie Tan, G. Tucker, S. Levine
- Computer ScienceRobotics: Science and Systems
- 1 December 2018
A sample-efficient deep RL algorithm based on maximum entropy RL that requires minimal per-task tuning and only a modest number of trials to learn neural network policies is proposed and achieves state-of-the-art performance on simulated benchmarks with a single set of hyperparameters.
Dynamical Distance Learning for Semi-Supervised and Unsupervised Skill Discovery
- Kristian Hartikainen, Xinyang Geng, Tuomas Haarnoja, S. Levine
- Computer ScienceInternational Conference on Learning…
- 18 July 2019
This paper studies how to automatically learn dynamical distances: a measure of the expected number of time steps to reach a given goal state from any other state, which can be used to provide well-shaped reward functions for reaching new goals, making it possible to learn complex tasks efficiently.
From Motor Control to Team Play in Simulated Humanoid Football
This work optimized teams of agents to play simulated football via reinforcement learning, constraining the solution space to that of plausible movements learned using human motion capture data, resulting in a team of coordinated humanoid football players that exhibit complex behavior at different scales, quantified by a range of analysis and statistics.
Acquiring Diverse Robot Skills via Maximum Entropy Deep Reinforcement Learning
- Tuomas Haarnoja
- Computer Science
- 2018
This thesis studies how maximum entropy framework can provide efficient deep reinforcement learning algorithms that solve tasks consistently and sample efficiently, and devise new algorithms based on this framework, starting from soft Q-learning that learns expressive energy-based policies, to soft actor-critic that provides simplicity and convenience of actor-Critic methods.
...
...