A Modified Policy Iteration Algorithm for Discounted Reward Markov Decision Processes

Abstract

The running time of the classical algorithms of the Markov Decision Process (MDP) typically grows linearly with the state space size, which makes them frequently intractable. This paper presents a Modified Policy Iteration algorithm to compute an optimal policy for large Markov decision processes in the discounted reward criteria and under infinite horizon… (More)

Topics

Figures and Tables

Sorry, we couldn't extract any figures or tables for this paper.