Regularized Policies are Reward Robust
@inproceedings{Husain2021RegularizedPA, title={Regularized Policies are Reward Robust}, author={H. Husain and K. Ciosek and Ryota Tomioka}, booktitle={AISTATS}, year={2021} }
Entropic regularization of policies in Reinforcement Learning (RL) is a commonly used heuristic to ensure that the learned policy explores the state-space sufficiently before overfitting to a local optimal policy. The primary motivation for using entropy is for exploration and disambiguating optimal policies; however, the theoretical effects are not entirely understood. In this work, we study the more general regularized RL objective and using Fenchel duality; we derive the dual problem which… Expand
References
SHOWING 1-10 OF 45 REFERENCES
A Regularized Approach to Sparse Optimal Policy in Reinforcement Learning
- Computer Science
- NeurIPS
- 2019
- 7
- Highly Influential
- PDF
Variational Policy Gradient Method for Reinforcement Learning with General Utilities
- Computer Science, Mathematics
- NeurIPS
- 2020
- 6
- PDF
Learning Robust Rewards with Adversarial Inverse Reinforcement Learning
- Computer Science, Mathematics
- ICLR
- 2018
- 248
- PDF
Understanding the impact of entropy on policy optimization
- Computer Science, Mathematics
- ICML
- 2019
- 59
- Highly Influential
- PDF
Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor
- Computer Science, Mathematics
- ICML
- 2018
- 1,312
- PDF
Provably Efficient Maximum Entropy Exploration
- Computer Science, Mathematics
- ICML
- 2019
- 61
- Highly Influential
- PDF
A Divergence Minimization Perspective on Imitation Learning Methods
- Computer Science, Mathematics
- CoRL
- 2019
- 39
- Highly Influential
- PDF
Equivalence Between Policy Gradients and Soft Q-Learning
- Computer Science, Mathematics
- ArXiv
- 2017
- 179
- Highly Influential
- PDF