Certainty Equivalence Policies Are Self-optimizing under Minimax Optimality Certainty Equivalent Policies Are Self-optimizing under Minimax Optimality

  • Csaba Szepesv
  • Published 1996

Abstract

We show that adaptive real time dynamic programming extended with the action selection strategy which chooses the best action according to the latest estimate of the value function yields asymptot-ically optimal policies under the minimax optimality criterion, within nite time with probability one. From this it follows that learning and exploitation do not… (More)

Topics

  • Presentations referencing similar topics