Chapter 3 Introduction to Markov Decision Processes


There is a very elegant theory for solving stochastic, dynamic programs if we are willing to live within some fairly limiting assumptions. Assume that we have a discrete state space S = (1, 2, . . . , |S|), where S is small enough to enumerate. Next assume that there is a relatively small set of decisions or actions, which we denote by a ∈ A, and that we can compute a cost (if minimizing) or contribution (if maximizing) given by C(s, a). Finally, assume that we are given a transition matrix pt(St+1|St, at) which gives the probability that if we are in state St (at time t) and take action at, then we will next be in state St+1. From time to time, we are going to switch gears to consider problems where the decision is a vector. When this happens, we will use x as our decision variable. However, there are many applications where the number of actions is discrete and small, and there are many algorithms that are specifically designed for small action spaces. In particular, the material in this chapter is designed for small action spaces, and as a result we use a for action throughout. There are many problems where states are continuous, or the state variable is a vector producing a state space that is far too large to enumerate. In addition, computing the one-step transition matrix pt(St+1|St, at) can also be difficult or impossible to compute. So why cover material that is widely acknowledged to work only on small or highly specialized problems? First, some problems have small state and action spaces and can be solved with these techniques. Second, the theory of Markov decision processes can be used to identify structural properties that can dramatically simplify computational algorithms. But far more importantly, this material provides the intellectual foundation for the types of algorithms that we present in later chapters. Using the framework in this chapter, we

4 Figures and Tables

Cite this paper

@inproceedings{Powell2011Chapter3I, title={Chapter 3 Introduction to Markov Decision Processes}, author={Warren B. Powell}, year={2011} }