• Corpus ID: 226227358

# Finding the Near Optimal Policy via Adaptive Reduced Regularization in MDPs

@article{Yang2020FindingTN,
title={Finding the Near Optimal Policy via Adaptive Reduced Regularization in MDPs},
author={Wenhao Yang and Xiang Li and Guangzeng Xie and Zhihua Zhang},
journal={ArXiv},
year={2020},
volume={abs/2011.00213}
}
• Published 31 October 2020
• Computer Science
• ArXiv
Regularized MDPs serve as a smooth version of original MDPs. However, biased optimal policy always exists for regularized MDPs. Instead of making the coefficient{\lambda}of regularized term sufficiently small, we propose an adaptive reduction scheme for {\lambda} to approximate optimal policy of the original MDP. It is shown that the iteration complexity for obtaining an{\epsilon}-optimal policy could be reduced in comparison with setting sufficiently small{\lambda}. In addition, there exists…
4 Citations

## Tables from this paper

### Softmax Policy Gradient Methods Can Take Exponential Time to Converge

• Computer Science
COLT
• 2021
It is demonstrated that softmax PG methods can take exponential time to converge, even in the presence of a benign policy initialization and an initial state distribution amenable to exploration, and the exponential lower bound hints at the necessity of carefully adjusting update rules or enforcing proper regularization in accelerating PG methods.

### Finite-Time Analysis of Entropy-Regularized Neural Natural Actor-Critic Algorithm

• Computer Science
ArXiv
• 2022
It is proved that entropy regularization and averaging ensure stability by providing near-deterministic and strictly suboptimal policies and regularization leads to sharp sample complexity and network width bounds in the regularized MDPs, yielding a favorable bias-variance tradeoﬀ in policy optimization.

### The Power of Regularization in Solving Extensive-Form Games

• Computer Science, Mathematics
ArXiv
• 2022
This paper proposes a series of new algorithms based on regularizing the payoff functions of the game, and establishes a set of convergence results that strictly improve over the existing ones, with either weaker assumptions or stronger convergence guarantees.

### Fast Policy Extragradient Methods for Competitive Games with Entropy Regularization

• Computer Science
NeurIPS
• 2021
Motivated by the algorithmic role of entropy regularization in single-agent reinforcement learning and game theory, provably efficient extragradient methods to find the quantal response equilibrium (QRE)—which are solutions to zero-sum two-player matrix games with entropy regularizations—at a linear rate are developed.

## References

SHOWING 1-10 OF 33 REFERENCES

### A Regularized Approach to Sparse Optimal Policy in Reinforcement Learning

• Computer Science
NeurIPS
• 2019
A generic method to devise regularization forms and propose off-policy actor critic algorithms in complex environment settings is provided and a full mathematical analysis of the proposed regularized MDPs are conducted.

### On the Convergence of Approximate and Regularized Policy Iteration Schemes

• Computer Science, Mathematics
ArXiv
• 2019
This paper proposes the optimality-preserving regularized modified policy iteration (MPI) scheme that simultaneously provides desirable properties to intermediate policies such as targeted exploration, and guarantees convergence to the optimal policy with explicit rates depending on the decrease rate of the regularization parameter.

### A unified view of entropy-regularized Markov decision processes

• Computer Science
ArXiv
• 2017
A general framework for entropy-regularized average-reward reinforcement learning in Markov decision processes (MDPs) is proposed, showing that using the conditional entropy of the joint state-action distributions as regularization yields a dual optimization problem closely resembling the Bellman optimality equations.

### A Theory of Regularized Markov Decision Processes

• Computer Science
ICML
• 2019
A general theory of regularized Markov Decision Processes that generalizes these approaches in two directions: a larger class of regularizers, and the general modified policy iteration approach, encompassing both policy iteration and value iteration.

### Global Optimality Guarantees For Policy Gradient Methods

• Computer Science
ArXiv
• 2019
This work identifies structural properties -- shared by finite MDPs and several classic control problems -- which guarantee that policy gradient objective function has no suboptimal local minima despite being non-convex.

### Adaptive Trust Region Policy Optimization: Global Convergence and Faster Rates for Regularized MDPs

• Computer Science
AAAI
• 2020
This work shows that the adaptive scaling mechanism used in TRPO is in fact the natural “RL version” of traditional trust-region methods from convex analysis, and proves fast rates of Õ(1/N), much like results in convex optimization.

### Dynamic policy programming

• Computer Science
J. Mach. Learn. Res.
• 2012
The finite-iteration and asymptotic l\infty-norm performance-loss bounds for DPP in the presence of approximation/estimation error are proved and suggest that DPP can achieve a better performance than AVI and API since it averages out the simulation noise caused by Monte-Carlo sampling throughout the learning process.

### On the Global Convergence Rates of Softmax Policy Gradient Methods

• Computer Science
ICML
• 2020
It is shown that with the true gradient, policy gradient with a softmax parametrization converges at a $O(1/t)$ rate, with constants depending on the problem and initialization, which significantly expands the recent asymptotic convergence results.

### On the Theory of Policy Gradient Methods: Optimality, Approximation, and Distribution Shift

• Computer Science
J. Mach. Learn. Res.
• 2021
This work provides provable characterizations of the computational, approximation, and sample size properties of policy gradient methods in the context of discounted Markov Decision Processes (MDPs), and shows an important interplay between estimation error, approximation error, and exploration.

### Understanding the impact of entropy on policy optimization

• Computer Science
ICML
• 2019
New tools for understanding the optimization landscape are presented, it is shown that policy entropy serves as a regularizer, and the challenge of designing general-purpose policy optimization algorithms is highlighted.