Learn More
We consider the problem of learning high-performance Exploration/Exploitation (E/E) strategies for finite Markov Decision Processes (MDPs) when the MDP to be controlled is supposed to be drawn from a known probability distribution p M (·). The performance criterion is the sum of discounted rewards collected by the E/E strategy over an infinite length(More)
In the Bayesian Reinforcement Learning (BRL) setting, agents try to maximise the collected rewards while interacting with their environment while using some prior knowledge that is accessed beforehand. Many BRL algorithms have already been proposed, but the benchmarks used to compare them are only relevant for specific cases. The paper addresses this(More)
Bayesian Reinforcement Learning (BRL) agents aim to maximise the expected collected rewards obtained when interacting with an unknown Markov Decision Process (MDP) while using some prior knowledge. State-of-the-art BRL agents rely on frequent updates of the belief on the MDP, as new observations of the environment are made. This offers theoretical(More)
Faculty of Applied Sciences Department of Electrical Engineering and Computer Science Doctor of Computer Science Offline Policy-search in Bayesian Reinforcement Learning by Michael CASTRONOVO This thesis presents research contributions in the study field of Bayesian Reinforcement Learning — a subfield of Reinforcement Learning where, even though the(More)
  • 1