# Model-based Bayesian Reinforcement Learning in Partially Observable Domains

@inproceedings{Poupart2008ModelbasedBR, title={Model-based Bayesian Reinforcement Learning in Partially Observable Domains}, author={Pascal Poupart and Nikos A. Vlassis}, booktitle={ISAIM}, year={2008} }

Bayesian reinforcement learning in partially observable domains is notoriously difficult, in part due to the unknown form of the beliefs and the optimal value function. We show that beliefs represented by mixtures of products of Dirichlet distributions are closed under belief updates for factored domains. Belief monitoring algorithms that use this mixture representation are proposed. We also show that the optimal value function is a linear combination of products of Dirichlets for factored…

## 86 Citations

Bayesian Reinforcement Learning

- Computer ScienceEncyclopedia of Machine Learning
- 2010

This chapter surveys recent lines of work that use Bayesian techniques for reinforcement learning by explicitly maintaining a distribution over various quantities such as the parameters of the model, the value function, the policy or its gradient.

Bayesian Reinforcement Learning

- Computer ScienceReinforcement Learning
- 2012

This chapter surveys recent lines of work that use Bayesian techniques for reinforcement learning by explicitly maintaining a distribution over various quantities such as the parameters of the model, the value function, the policy or its gradient.

Acting and Bayesian Reinforcement Structure Learning of Partially Observable Environment

- Mathematics, Computer ScienceITAT
- 2014

This article shows how to learn both the structure and the parameters of partially observable en- vironment simultaneously while also online performing near-optimal sequence of actions taking into…

Monte Carlo Bayesian Reinforcement Learning

- Mathematics, Computer ScienceICML
- 2012

Bayesian reinforcement learning (BRL) encodes prior knowledge of the world in a model and represents uncertainty in model parameters by maintaining a probability distribution over them. This paper…

Nonparametric Bayesian Policy Priors for Reinforcement Learning

- Computer ScienceNIPS
- 2010

This work considers reinforcement learning in partially observable domains where the agent can query an expert for demonstrations and introduces priors that bias the agent towards models with both simple representations and simple policies, resulting in improved policy and model learning.

Reinforcement learning with limited reinforcement: using Bayes risk for active learning in POMDPs

- Computer Science, MedicineICML '08
- 2008

This paper presents an approximation approach that allows us to treat the POMDP model parameters as additional hidden state in a "model-uncertainty" PomDP.

A Bayesian Approach for Learning and Planning in Partially Observable Markov Decision Processes

- Mathematics, Computer ScienceJ. Mach. Learn. Res.
- 2011

This paper introduces the Bayes-Adaptive Partially Observable Markov Decision Processes, a new framework that can be used to simultaneously learn a model of the POMDP domain through interaction with the environment, and track the state of the system under partial observability.

Model-based Bayesian Reinforcement Learning in Factored Markov Decision Process

- Computer ScienceJ. Comput.
- 2014

The proposed model-based factored Bayesian reinforcement learning (F-BRL) approach can effectively reduce the number of learning parameters, and enable online learning for dynamic systems with thousands of states.

Monte-Carlo Bayesian Reinforcement Learning Using a Compact Factored Representation

- Computer Science2017 4th International Conference on Information Science and Control Engineering (ICISCE)
- 2017

This paper proposes a novel Monte Carlo tree search for Bayesian reinforcement learning approach using a compact factored representation, to solve the Bayesian reinforcing learning problem online.

MCTS on model-based Bayesian Reinforcement Learning for efficient learning in Partially Observable environments

- Computer Science
- 2018

This work focuses on solving partially observable domains, typically modeled as Partially Observable Markov Decision Processes (POMDPs), which are well-known to be hard to solve due to uncertainty as a result of stochastic transitions, partial observability, and unknown dynamics.

## References

SHOWING 1-10 OF 20 REFERENCES

A Bayesian Framework for Reinforcement Learning

- Computer ScienceICML
- 2000

It is proposed that the learning process estimates online the full posterior distribution over models and to determine behavior, a hypothesis is sampled from this distribution and the greedy policy with respect to the hypothesis is obtained by dynamic programming.

An analytic solution to discrete Bayesian reinforcement learning

- Computer ScienceICML
- 2006

This work proposes a new algorithm, called BEETLE, for effective online learning that is computationally efficient while minimizing the amount of exploration, and takes a Bayesian model-based approach, framing RL as a partially observable Markov decision process.

Model based Bayesian Exploration

- Computer Science, MathematicsUAI
- 1999

This paper explicitly represents uncertainty about the parameters of the model and build probability distributions over Q-values based on these that are used to compute a myopic approximation to the value of information for each action and hence to select the action that best balances exploration and exploitation.

Active Learning in Partially Observable Markov Decision Processes

- Computer ScienceECML
- 2005

Results show good performance of the algorithm even in large problems: the most useful parameters of the model are learned quickly and the agent still accumulates high reward throughout the process.

Bayesian sparse sampling for on-line reward optimization

- Computer ScienceICML
- 2005

The idea is to grow a sparse lookahead tree, intelligently, by exploiting information in a Bayesian posterior---rather than enumerate action branches (standard sparse sampling) or compensate myopically (value of perfect information).

Point-Based Value Iteration for Continuous POMDPs

- Mathematics, Computer ScienceJ. Mach. Learn. Res.
- 2006

It is demonstrated that the value function for continuous POMDPs is convex in the beliefs over continuous state spaces, and piecewise-linear convex for the particular case of discrete observations and actions but still continuous states.

Using Linear Programming for Bayesian Exploration in Markov Decision Processes

- Computer ScienceIJCAI
- 2007

Ideas for making this model of the environment as a Markov Decision Process computationally tractable are explored and a sample finite-length trajectories from the infinite tree are sample using ideas based on sparse sampling.

Learning in Graphical Models

- Computer ScienceNATO ASI Series
- 1998

This paper presents an introduction to inference for Bayesian networks and a view of the EM algorithm that justifies incremental, sparse and other variants, as well as an information-theoretic analysis of hard and soft assignment methods for clustering.

Scalable Internal-State Policy-Gradient Methods for POMDPs

- Computer ScienceICML
- 2002

Several improved algorithms for learning policies with memory in an infinite-horizon setting are developed — directly when a known model of the environment is available, and via simulation otherwise.

Scaling Internal-State Policy-Gradient Methods for POMDPs

- 2002

Policy-gradient methods have received increased attention recently as a mechanism for learning to act in partially observable environments. They have shown promise for problems admitting memoryless…