• Publications
  • Influence
Reinforcement Learning in POMDPs Without Resets
TLDR
We present new algorithms for learning in POMDPs which guarantee that the agent will obtain the optimal average reward in the limit. Expand
  • 43
  • PDF
Planning in POMDPs Using Multiplicity Automata
TLDR
We show that POMDPs can be represented by multiplicity automata with no increase in the representation size. Expand
  • 13
  • PDF
On-line Markov Decision Processes
We consider an MDP setting in which the reward function is allowed to change during each time step of play (possibly in an adversarial manner), yet the dynamics remain fixed. Similar to the expertsExpand
  • 2