A time aggregation approach to Markov decision processes


We propose a time aggregation approach for the solution of in3nite horizon average cost Markov decision processes via policy iteration. In this approach, policy update is only carried out when the process visits a subset of the state space. As in state aggregation, this approach leads to a reduced state space, which may lead to a substantial reduction in computational and storage requirements, especially for problems with certain structural properties. However, in contrast to state aggregation, which generally results in an approximate model due to the loss of Markov property, time aggregation su9ers no loss of accuracy, because the Markov property is preserved. Single sample path-based estimation algorithms are developed that allow the time aggregation approach to be implemented on-line for practical systems. Some numerical and simulation examples are presented to illustrate the ideas and potential computational savings. ? 2002 Elsevier Science Ltd. All rights reserved.

Extracted Key Phrases

5 Figures and Tables

Cite this paper

@inproceedings{Caoa1999ATA, title={A time aggregation approach to Markov decision processes}, author={Xi-Ren Caoa and Zhiyuan Rena and Shalabh Bhatnagarb and Michael Fub and Steven Marcusb and Tamer Basar}, year={1999} }