Learn More
Upper Confidence Trees are a very efficient tool for solving Markov Decision Processes; originating in difficult games like the game of Go, it is in particular surprisingly efficient in high dimensional problems. It is known that it can be adapted to continuous domains in some cases (in particular continuous action spaces). We here present an extension of(More)
HAL is a multidisciplinary open access archive for the deposit and dissemination of scientific research documents, whether they are published or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L'archive ouverte pluridisciplinaire HAL, est destinée au dépôt età la diffusion(More)
In the Bayesian Reinforcement Learning (BRL) setting, agents try to maximise the collected rewards while interacting with their environment while using some prior knowledge that is accessed beforehand. Many BRL algorithms have already been proposed, but the benchmarks used to compare them are only relevant for specific cases. The paper addresses this(More)
In the standard version of the UCT algorithm, in the case of a continuous set of decisions, the exploration of new decisions is done through blind search. This can lead to very inefficient exploration, particularly in the case of large dimension problems, which often happens in energy management problems, for instance. In an attempt to use the information(More)
—Estimating the belief state is the main issue in games with Partial Observation. It is commonly done by heuristic methods, with no mathematical guarantee. We here focus on mathematically consistent belief state estimation methods, in the case of one-player games. We clearly separate the search algorithm (which might be e.g. alpha-beta or Monte-Carlo Tree(More)
Upper Confidence Trees (UCT) are now a well known algorithm for sequential decision making; it is a provably consistent variant of Monte-Carlo Tree Search. However, the consistency is only proved in a the case where both the action space is finite. We here propose a proof in the case of fully observable Markov Decision Processes with bounded horizon,(More)
Current state of the art methods in energy policy planning only approximate the problem (Linear Programming on a finite sample of scenarios, Dynamic Programming on an approximation of the problem , etc). Monte-Carlo Tree Search (MCTS [3]) seems to be a potential candidate to converge to an exact solution of these problems ([2]). But how fast, and how do key(More)