Reinforcement Learning for POMDPs Based on Action Values and Stochastic Optimization


We present a new, model-free reinforcement learning algorithm for learning to control partially-observable Markov decision processes. The algorithm incorporates ideas from action-value based reinforcement learning approaches, such as Q-Learning, as well as ideas from the stochastic optimization literature. Key to our approach is a new definition of action… (More)


3 Figures and Tables