Accelerating Procedures of the Value Iteration Algorithm for Discounted Markov Decision Processes, Based on a One-Step Lookahead Analysis
@article{Herzberg1994AcceleratingPO, title={Accelerating Procedures of the Value Iteration Algorithm for Discounted Markov Decision Processes, Based on a One-Step Lookahead Analysis}, author={Meir Herzberg and Uri Yechiali}, journal={Operations Research}, year={1994}, volume={42}, pages={940-946} }
Accelerating procedures for solving discounted Markov decision processes problems are developed based on a one-step lookahead analysis of the value iteration algorithm. We apply the criteria of minimum difference and minimum variance to obtain good adaptive relaxation factors that speed up the convergence of the algorithm. Several problems including Howard's automobile replacement are tested and a preliminary numerical evaluation reveals considerable reductions in computation time when compared…
17 Citations
A K-step look-ahead analysis of value iteration algorithms for Markov decision processes
- Computer Science
- 1996
Acceleration Operators in the Value Iteration Algorithms for Markov Decision Processes
- MathematicsOper. Res.
- 2010
We study the general approach to accelerating the convergence of the most widely used solution method of Markov decision processes (MDPs) with the total expected discounted reward. Inspired by the…
Acceleration of Iterative Methods for Markov Decision Processes
- Computer Science
- 2010
A class of operators that can be integrated into value iteration and modified policy iteration algorithms for Markov Decision Processes, so as to speed up the convergence of the iterative search.
Accelerated modified policy iteration algorithms for Markov decision processes
- MathematicsMath. Methods Oper. Res.
- 2013
In the new policy iteration an additional operator is applied to the iterate generated by Markov operator, resulting in a bigger improvement in each iteration of the modified policy iteration method.
An Adaptive State Aggregation Algorithm for Markov Decision Processes
- Computer ScienceArXiv
- 2021
This paper proposes an intuitive algorithm for solving MDPs that reduces the cost of value iteration updates by dynamically grouping together states with similar cost-to-go values and proves that the algorithm converges almost surely to within 2ε/(1 − γ) of the true optimal value in the `∞ norm.
A First-Order Approach to Accelerated Value Iteration
- Computer ScienceOperations Research
- 2022
The authors provide a lower bound on the convergence properties of any first-order algorithm for solving MDPs, where no algorithm can converge faster than VI, and introduce safe accelerated value iteration (S-AVI), which alternates between accelerated updates and value iteration updates.
GENERAL DYNAMIC PROGRAMMING ALGORITHMSAPPLIED TO POLLING
- Computer Science
- 1998
A new implementation of the modiied policy iteration (MPI) dynamic programming algorithm is developed to eeciently solve problems with large state spaces and small action spaces and provides evidence that MPI outperforms the other algorithms for both the discounted cost and the average cost optimal polling problems.
Discounted Markov Decision Processes with Constrained Costs: the decomposition approach
- Mathematics
- 2021
The decomposition method of the state space into the strongly communicating classes for computing an optimal or a nearly optimal stationary policy for discrete time Markov Decision Processes with finite state and action spaces is investigated.
General dynamic programming algorithms applied to polling systems
- Computer Science
- 1998
A new implementation of the modified policy iteration (MPI) dynamic programming algorithm is developed to efficiently solve problems with large state spaces and small action spaces and to provide evidence that MPI outperforms the other algorithms for both the discounted cost and the average cost optimal polling problems.
References
SHOWING 1-10 OF 17 REFERENCES
Criteria for selecting the relaxation factor of the value iteration algorithm for undiscounted Markov and semi-Markov decision processes
- MathematicsOper. Res. Lett.
- 1991
The action elimination algorithm for Markov decision processes
- Computer Science
- 1976
An efficient algorithm for solving Markov decision problems is proposed and the nonoptimality test, which is an extension of Hastings test for the undiscounted reward case, is used to identify actions which cannot be optimal at the current stage.
Modified Policy Iteration Algorithms for Discounted Markov Decision Problems
- Computer Science
- 1978
A class of modified policy iteration algorithms for solving Markov decision problems correspond to performing policy evaluation by successive approximations and it is shown that all of these algorithms converge at least as quickly as successive approxIMations and estimates of their rates of convergence are obtained.
The Convergence of Value Iteration in Discounted Markov Decision Processes
- Mathematics
- 1994
Abstract Considerable numerical experience indicates that the standard value iteration procedure for infinite horizon discounted Markov decision processes performs much better than the usual error…
Letter to the Editor - A Test for Suboptimal Actions in Markovian Decision Problems
- MathematicsOper. Res.
- 1967
A simple test is described that may show that certain actions are suboptimal, permanently eliminating them from further consideration, and may be incorporated into the dynamic programming routine for solving the decision problem.
Technical Note - Accelerated Computation of the Expected Discounted Return in a Markov Chain
- MathematicsOper. Res.
- 1978
This note investigates the use of extrapolations with certain iterative methods to accelerate the computation of the expected discounted return in a finite Markov chain and finds some form of extrapolation with Gauss-Seidel iteration after reordering may turn out to be more efficient in practice than successive over-relaxation.
Action Elimination Procedures for Modified Policy Iteration Algorithms
- MathematicsOper. Res.
- 1982
Procedures to eliminate nonoptimal actions for one iteration and for all subsequent iterations are presented and encouraging computational results are presented.
COMPUTATIONAL COMPARISON OF VALUE ITERATION ALGORITHMS FOR DISCOUNTED MARKOV DECISION PROCESSES
- Mathematics, Computer Science
This paper describes a computational comparison of value iteration algorithms for discounted Markov decision processes and concludes that the current state-of-the-art approaches to solving these problems are unsatisfactory.
Real Applications of Markov Decision Processes
- Mathematics
- 1985
In the first few years of an ongoing survey of applications of Markov decision processes where the results have been implemented or have had some influence on decisions, few applications have been identified but there appears to be an increasing effort to model many phenomena as Markov decisions processes.
Accelerated procedures for the solution of discrete Markov control problems
- Mathematics, Computer Science
- 1970
Methods for computing optimal controls for a Markov chain model gave a ten-fold decrease in computation time over a more usual procedure of dynamic programming.