Accelerating Procedures of the Value Iteration Algorithm for Discounted Markov Decision Processes, Based on a One-Step Lookahead Analysis

@article{Herzberg1994AcceleratingPO,
  title={Accelerating Procedures of the Value Iteration Algorithm for Discounted Markov Decision Processes, Based on a One-Step Lookahead Analysis},
  author={Meir Herzberg and Uri Yechiali},
  journal={Operations Research},
  year={1994},
  volume={42},
  pages={940-946}
}
Accelerating procedures for solving discounted Markov decision processes problems are developed based on a one-step lookahead analysis of the value iteration algorithm. We apply the criteria of minimum difference and minimum variance to obtain good adaptive relaxation factors that speed up the convergence of the algorithm. Several problems including Howard's automobile replacement are tested and a preliminary numerical evaluation reveals considerable reductions in computation time when compared… 

Figures and Tables from this paper

Acceleration Operators in the Value Iteration Algorithms for Markov Decision Processes
We study the general approach to accelerating the convergence of the most widely used solution method of Markov decision processes (MDPs) with the total expected discounted reward. Inspired by the
Acceleration of Iterative Methods for Markov Decision Processes
TLDR
A class of operators that can be integrated into value iteration and modified policy iteration algorithms for Markov Decision Processes, so as to speed up the convergence of the iterative search.
Accelerated modified policy iteration algorithms for Markov decision processes
TLDR
In the new policy iteration an additional operator is applied to the iterate generated by Markov operator, resulting in a bigger improvement in each iteration of the modified policy iteration method.
An Adaptive State Aggregation Algorithm for Markov Decision Processes
TLDR
This paper proposes an intuitive algorithm for solving MDPs that reduces the cost of value iteration updates by dynamically grouping together states with similar cost-to-go values and proves that the algorithm converges almost surely to within 2ε/(1 − γ) of the true optimal value in the `∞ norm.
A First-Order Approach to Accelerated Value Iteration
TLDR
The authors provide a lower bound on the convergence properties of any first-order algorithm for solving MDPs, where no algorithm can converge faster than VI, and introduce safe accelerated value iteration (S-AVI), which alternates between accelerated updates and value iteration updates.
GENERAL DYNAMIC PROGRAMMING ALGORITHMSAPPLIED TO POLLING
TLDR
A new implementation of the modiied policy iteration (MPI) dynamic programming algorithm is developed to eeciently solve problems with large state spaces and small action spaces and provides evidence that MPI outperforms the other algorithms for both the discounted cost and the average cost optimal polling problems.
Discounted Markov Decision Processes with Constrained Costs: the decomposition approach
TLDR
The decomposition method of the state space into the strongly communicating classes for computing an optimal or a nearly optimal stationary policy for discrete time Markov Decision Processes with finite state and action spaces is investigated.
General dynamic programming algorithms applied to polling systems
TLDR
A new implementation of the modified policy iteration (MPI) dynamic programming algorithm is developed to efficiently solve problems with large state spaces and small action spaces and to provide evidence that MPI outperforms the other algorithms for both the discounted cost and the average cost optimal polling problems.
...
1
2
...

References

SHOWING 1-10 OF 17 REFERENCES
The action elimination algorithm for Markov decision processes
TLDR
An efficient algorithm for solving Markov decision problems is proposed and the nonoptimality test, which is an extension of Hastings test for the undiscounted reward case, is used to identify actions which cannot be optimal at the current stage.
Modified Policy Iteration Algorithms for Discounted Markov Decision Problems
TLDR
A class of modified policy iteration algorithms for solving Markov decision problems correspond to performing policy evaluation by successive approximations and it is shown that all of these algorithms converge at least as quickly as successive approxIMations and estimates of their rates of convergence are obtained.
The Convergence of Value Iteration in Discounted Markov Decision Processes
Abstract Considerable numerical experience indicates that the standard value iteration procedure for infinite horizon discounted Markov decision processes performs much better than the usual error
Letter to the Editor - A Test for Suboptimal Actions in Markovian Decision Problems
TLDR
A simple test is described that may show that certain actions are suboptimal, permanently eliminating them from further consideration, and may be incorporated into the dynamic programming routine for solving the decision problem.
Technical Note - Accelerated Computation of the Expected Discounted Return in a Markov Chain
TLDR
This note investigates the use of extrapolations with certain iterative methods to accelerate the computation of the expected discounted return in a finite Markov chain and finds some form of extrapolation with Gauss-Seidel iteration after reordering may turn out to be more efficient in practice than successive over-relaxation.
Action Elimination Procedures for Modified Policy Iteration Algorithms
TLDR
Procedures to eliminate nonoptimal actions for one iteration and for all subsequent iterations are presented and encouraging computational results are presented.
COMPUTATIONAL COMPARISON OF VALUE ITERATION ALGORITHMS FOR DISCOUNTED MARKOV DECISION PROCESSES
TLDR
This paper describes a computational comparison of value iteration algorithms for discounted Markov decision processes and concludes that the current state-of-the-art approaches to solving these problems are unsatisfactory.
Real Applications of Markov Decision Processes
TLDR
In the first few years of an ongoing survey of applications of Markov decision processes where the results have been implemented or have had some influence on decisions, few applications have been identified but there appears to be an increasing effort to model many phenomena as Markov decisions processes.
Accelerated procedures for the solution of discrete Markov control problems
TLDR
Methods for computing optimal controls for a Markov chain model gave a ten-fold decrease in computation time over a more usual procedure of dynamic programming.
...
1
2
...