On the Convergence of Approximate and Regularized Policy Iteration Schemes
@article{Smirnova2019OnTC, title={On the Convergence of Approximate and Regularized Policy Iteration Schemes}, author={E. Smirnova and Elvis Dohmatob}, journal={ArXiv}, year={2019}, volume={abs/1909.09621} }
Algorithms based on the entropy regularized framework, such as Soft Q-learning and Soft Actor-Critic, recently showed state-of-the-art performance on a number of challenging reinforcement learning (RL) tasks. The regularized formulation modifies the standard RL objective and thus, generally, converges to a policy different from the optimal greedy policy of the original RL problem. Practically, it is important to control the suboptimality of the regularized optimal policy. In this paper, we… CONTINUE READING
Figures, Tables, and Topics from this paper
One Citation
Finding the Near Optimal Policy via Adaptive Reduced Regularization in MDPs
- Computer Science, Mathematics
- ArXiv
- 2020
- PDF
References
SHOWING 1-10 OF 24 REFERENCES
A unified view of entropy-regularized Markov decision processes
- Computer Science, Mathematics
- ArXiv
- 2017
- 108
- PDF
Convergence Results for Single-Step On-Policy Reinforcement-Learning Algorithms
- Computer Science
- Machine Learning
- 2004
- 550
- PDF
Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor
- Computer Science, Mathematics
- ICML
- 2018
- 1,129
- PDF
A Theory of Regularized Markov Decision Processes
- Computer Science, Mathematics
- ICML
- 2019
- 65
- Highly Influential
- PDF
Approximate modified policy iteration and its application to the game of Tetris
- Computer Science
- J. Mach. Learn. Res.
- 2015
- 62
- PDF
Bridging the Gap Between Value and Policy Based Reinforcement Learning
- Computer Science, Mathematics
- NIPS
- 2017
- 202
- PDF
Theoretical Analysis of Efficiency and Robustness of Softmax and Gap-Increasing Operators in Reinforcement Learning
- Computer Science
- AISTATS
- 2019
- 12
- Highly Influential
- PDF
Taming the Noise in Reinforcement Learning via Soft Updates
- Computer Science, Mathematics
- UAI
- 2016
- 173
- PDF