A Modified Policy Iteration Algorithm for Discounted Reward Markov Decision Processes


The running time of the classical algorithms of the Markov Decision Process (MDP) typically grows linearly with the state space size, which makes them frequently intractable. This paper presents a Modified Policy Iteration algorithm to compute an optimal policy for large Markov decision processes in the discounted reward criteria and under infinite horizon… (More)


Figures and Tables

Sorry, we couldn't extract any figures or tables for this paper.