Corpus ID: 202712723

On the Convergence of Approximate and Regularized Policy Iteration Schemes

  title={On the Convergence of Approximate and Regularized Policy Iteration Schemes},
  author={E. Smirnova and Elvis Dohmatob},
  • E. Smirnova, Elvis Dohmatob
  • Published 2019
  • Mathematics, Computer Science
  • ArXiv
  • Algorithms based on the entropy regularized framework, such as Soft Q-learning and Soft Actor-Critic, recently showed state-of-the-art performance on a number of challenging reinforcement learning (RL) tasks. The regularized formulation modifies the standard RL objective and thus, generally, converges to a policy different from the optimal greedy policy of the original RL problem. Practically, it is important to control the suboptimality of the regularized optimal policy. In this paper, we… CONTINUE READING
    1 Citations
    Finding the Near Optimal Policy via Adaptive Reduced Regularization in MDPs
    • PDF


    Dynamic policy programming
    • 77
    • PDF
    A unified view of entropy-regularized Markov decision processes
    • 108
    • PDF
    Convergence Results for Single-Step On-Policy Reinforcement-Learning Algorithms
    • 550
    • PDF
    Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor
    • 1,129
    • PDF
    A Theory of Regularized Markov Decision Processes
    • 65
    • Highly Influential
    • PDF
    Approximate modified policy iteration and its application to the game of Tetris
    • 62
    • PDF
    Soft Actor-Critic Algorithms and Applications
    • 286
    • PDF
    Bridging the Gap Between Value and Policy Based Reinforcement Learning
    • 202
    • PDF
    Theoretical Analysis of Efficiency and Robustness of Softmax and Gap-Increasing Operators in Reinforcement Learning
    • 12
    • Highly Influential
    • PDF
    Taming the Noise in Reinforcement Learning via Soft Updates
    • 173
    • PDF