Potential-based reward shaping for finite horizon online POMDP planning

  title={Potential-based reward shaping for finite horizon online POMDP planning},
  author={Adam Eck and Leen-Kiat Soh and Sam Devlin and Daniel Kudenko},
  journal={Autonomous Agents and Multi-Agent Systems},
In this paper, we address the problem of suboptimal behavior during online partially observable Markov decision process (POMDP) planning caused by time constraints on planning. Taking inspiration from the related field of reinforcement learning (RL), our solution is to shape the agent’s reward function in order to lead the agent to large future rewards without having to spend as much time explicitly estimating cumulative future rewards, enabling the agent to save time to improve the breadth… CONTINUE READING