# Reinforcement Learning with Deep Energy-Based Policies

@inproceedings{Haarnoja2017ReinforcementLW, title={Reinforcement Learning with Deep Energy-Based Policies}, author={Tuomas Haarnoja and Haoran Tang and P. Abbeel and Sergey Levine}, booktitle={ICML}, year={2017} }

We propose a method for learning expressive energy-based policies for continuous states and actions, which has been feasible only in tabular domains before. We apply our method to learning maximum entropy policies, resulting into a new algorithm, called soft Q-learning, that expresses the optimal policy via a Boltzmann distribution. We use the recently proposed amortized Stein variational gradient descent to learn a stochastic sampling network that approximates samples from this distribution…

## 734 Citations

Learning Robot Skill Embeddings

- Computer ScienceNIPS 2017
- 2017

The main contribution of this work is an entropyregularized policy gradient formulation for hierarchical policies, and an associated, data-efficient and robust off-policy gradient algorithm based on stochastic value gradients.

Learning Skill Embeddings for Transferable Robot Skills

- Computer ScienceNIPS 2017
- 2017

The main contribution of this work is an entropyregularized policy gradient formulation for hierarchical policies, and an associated, data-efficient and robust off-policy gradient algorithm based on stochastic value gradients.

Bayesian Deep Q-Learning via Continuous-Time Flows

- Computer Science
- 2018

This work proposes an efficient algorithm for Bayesian deep Q-learning by posterior sampling actions in the Q-function via continuous-time flows (CTFs), achieving efficient exploration without explicit assumptions on the forms of posterior distributions.

Deep Active Inference as Variational Policy Gradients

- Computer ScienceJournal of Mathematical Psychology
- 2020

Soft Action Particle Deep Reinforcement Learning for a Continuous Action Space

- Computer Science2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
- 2019

A new off-policy actor-critic algorithm, which can reduce a significant number of parameters compared to existing actorcritic algorithms without any performance loss is introduced.

A Regularized Implicit Policy for Offline Reinforcement Learning

- Computer ScienceArXiv
- 2022

This work proposes a simple modiﬁcation to the classical policy-matching methods for regularizing with respect to the dual form of the Jensen–Shannon divergence and the integral probability metrics, and theoretically shows the correctness of the policy- matching approach.

Evolved Policy Gradients

- Computer ScienceNeurIPS
- 2018

Empirical results show that the evolved policy gradient algorithm (EPG) achieves faster learning on several randomized environments compared to an off-the-shelf policy gradient method, and its learned loss can generalize to out-of-distribution test time tasks, and exhibits qualitatively different behavior from other popular metalearning algorithms.

Quantile Regression Deep Reinforcement Learning

- Computer ScienceArXiv
- 2019

This work models the policy implicitly in the network, which gives the agent the freedom to approximate any distribution in each action dimension, not limiting its capabilities to the commonly used unimodal Gaussian parameterization.

Acquiring Diverse Robot Skills via Maximum Entropy Deep Reinforcement Learning

- Computer Science
- 2018

This thesis studies how maximum entropy framework can provide efficient deep reinforcement learning algorithms that solve tasks consistently and sample efficiently, and devise new algorithms based on this framework, starting from soft Q-learning that learns expressive energy-based policies, to soft actor-critic that provides simplicity and convenience of actor-Critic methods.

Particle-Based Adaptive Discretization for Continuous Control using Deep Reinforcement Learning

- Computer ScienceArXiv
- 2020

This paper proposes a general, yet simple, framework for improving the action exploration of policy gradient DRL algorithms that adapts ideas from the particle filtering literature to dynamically discretize the continuous action space and track policies represented as a mixture of Gaussians.

## References

SHOWING 1-10 OF 53 REFERENCES

Continuous control with deep reinforcement learning

- Computer ScienceICLR
- 2016

This work presents an actor-critic, model-free algorithm based on the deterministic policy gradient that can operate over continuous action spaces, and demonstrates that for many of the tasks the algorithm can learn policies end-to-end: directly from raw pixel inputs.

Actor-Critic Reinforcement Learning with Energy-Based Policies

- Computer ScienceEWRL
- 2012

This work introduces the first sound and e"cient algorithm for training energy-based policies, based on an actorcritic architecture, that is computationally e-cient, converges close to a local optimum, and outperforms Sallans and Hinton (2004) in several high dimensional domains.

Continuous Deep Q-Learning with Model-based Acceleration

- Computer ScienceICML
- 2016

This paper derives a continuous variant of the Q-learning algorithm, which it is called normalized advantage functions (NAF), as an alternative to the more commonly used policy gradient and actor-critic methods, and substantially improves performance on a set of simulated robotic control tasks.

Free-energy-based reinforcement learning in a partially observable environment

- Computer ScienceESANN
- 2010

This study extends the FERL framework to handle partially observable MDPs (POMDPs) by incorporating a recurrent neural network that learns a memory representation sufficient for predicting future observations and rewards.

Asynchronous Methods for Deep Reinforcement Learning

- Computer ScienceICML
- 2016

A conceptually simple and lightweight framework for deep reinforcement learning that uses asynchronous gradient descent for optimization of deep neural network controllers and shows that asynchronous actor-critic succeeds on a wide variety of continuous motor control problems as well as on a new task of navigating random 3D mazes using a visual input.

Stein Variational Policy Gradient

- Computer ScienceUAI
- 2017

A novel Stein variational policy gradient method (SVPG) which combines existing policy gradient methods and a repulsive functional to generate a set of diverse but well-behaved policies is proposed.

Playing Atari with Deep Reinforcement Learning

- Computer ScienceArXiv
- 2013

This work presents the first deep learning model to successfully learn control policies directly from high-dimensional sensory input using reinforcement learning, which outperforms all previous approaches on six of the games and surpasses a human expert on three of them.

Hierarchical Relative Entropy Policy Search

- Computer ScienceAISTATS
- 2012

This work defines the problem of learning sub-policies in continuous state action spaces as finding a hierarchical policy that is composed of a high-level gating policy to select the low-level sub-Policies for execution by the agent and treats them as latent variables which allows for distribution of the update information between the sub- policies.

Maximum Entropy Inverse Reinforcement Learning

- Computer ScienceAAAI
- 2008

A probabilistic approach based on the principle of maximum entropy that provides a well-defined, globally normalized distribution over decision sequences, while providing the same performance guarantees as existing methods is developed.

Reinforcement Learning with Unsupervised Auxiliary Tasks

- Computer ScienceICLR
- 2017

This paper significantly outperforms the previous state-of-the-art on Atari, averaging 880\% expert human performance, and a challenging suite of first-person, three-dimensional \emph{Labyrinth} tasks leading to a mean speedup in learning of 10$\times$ and averaging 87\% Expert human performance on Labyrinth.