Relative value iteration for average reward semi-Markov control via simulation

@article{Gosavi2013RelativeVI,
  title={Relative value iteration for average reward semi-Markov control via simulation},
  author={Abhijit Gosavi},
  journal={2013 Winter Simulations Conference (WSC)},
  year={2013},
  pages={623-630}
}
  • Abhijit Gosavi
  • Published 2013 in 2013 Winter Simulations Conference (WSC)
This paper studies the semi-Markov decision process (SMDP) under the long-run average reward criterion in the simulation-based context. Using dynamic programming, a straightforward approach for solving this problem involves policy iteration; a value iteration approach for this problem involves a transformation that induces an additional computational burden. In the simulation-based context, however, where one seeks to avoid the transition probabilities needed in dynamic programming, value… CONTINUE READING