Highly Influential

- Published 2007 in Discrete Event Dynamic Systems
DOI:10.1007/s10626-007-0014-3

This paper considers the problem of computing an optimal policy for a Markov Decision Process (MDP), under lack of complete a priori knowledge of (i) the branching probability distributions determining the evolution of the process state upon the execution of the different actions, and (ii) the probability distributions characterizing the immediate rewards… CONTINUE READING

### Presentations referencing similar topics