• Corpus ID: 17395633

Model-based Bayesian Reinforcement Learning in Partially Observable Domains

  title={Model-based Bayesian Reinforcement Learning in Partially Observable Domains},
  author={Pascal Poupart and Nikos A. Vlassis},
Bayesian reinforcement learning in partially observable domains is notoriously difficult, in part due to the unknown form of the beliefs and the optimal value function. We show that beliefs represented by mixtures of products of Dirichlet distributions are closed under belief updates for factored domains. Belief monitoring algorithms that use this mixture representation are proposed. We also show that the optimal value function is a linear combination of products of Dirichlets for factored… 

Figures from this paper

Bayesian Reinforcement Learning

  • P. Poupart
  • Computer Science
    Encyclopedia of Machine Learning
  • 2010
This chapter surveys recent lines of work that use Bayesian techniques for reinforcement learning by explicitly maintaining a distribution over various quantities such as the parameters of the model, the value function, the policy or its gradient.

Acting and Bayesian Reinforcement Structure Learning of Partially Observable Environment

This article shows how to learn both the structure and the parameters of partially observable en- vironment simultaneously while also online performing near-optimal sequence of actions taking into

Monte Carlo Bayesian Reinforcement Learning

Bayesian reinforcement learning (BRL) encodes prior knowledge of the world in a model and represents uncertainty in model parameters by maintaining a probability distribution over them. This paper

Nonparametric Bayesian Policy Priors for Reinforcement Learning

This work considers reinforcement learning in partially observable domains where the agent can query an expert for demonstrations and introduces priors that bias the agent towards models with both simple representations and simple policies, resulting in improved policy and model learning.

Reinforcement learning with limited reinforcement: using Bayes risk for active learning in POMDPs

This paper presents an approximation approach that allows us to treat the POMDP model parameters as additional hidden state in a "model-uncertainty" PomDP.

MCTS on model-based Bayesian Reinforcement Learning for efficient learning in Partially Observable environments

Bayesian model-based approaches promise the optimal solution to this fundamental issue of the Reinforcement Learning, to maintain a probability distribution over the possible dynamics of the POMDP and current state, and devise a action picking policy with respect to that distribution, explicitly reasoning over the uncertainty of the agent.

A Bayesian Approach for Learning and Planning in Partially Observable Markov Decision Processes

This paper introduces the Bayes-Adaptive Partially Observable Markov Decision Processes, a new framework that can be used to simultaneously learn a model of the POMDP domain through interaction with the environment, and track the state of the system under partial observability.

Monte-Carlo Bayesian Reinforcement Learning Using a Compact Factored Representation

  • Bo WuYan-Peng Feng
  • Computer Science
    2017 4th International Conference on Information Science and Control Engineering (ICISCE)
  • 2017
This paper proposes a novel Monte Carlo tree search for Bayesian reinforcement learning approach using a compact factored representation, to solve the Bayesian reinforcing learning problem online.

A Model-Based Factored Bayesian Reinforcement Learning Approach

This work exploits a factored representation to describe the states to reduce the size of learning parameters, and adopts Bayesian inference method to learn the unknown structure and parameters simultaneously simultaneously.

Intelligent Model Learning Based on Variance for Bayesian Reinforcement Learning

This work considers a modular method to reinforcement learning that represents uncertainty of model parameters by maintaining probability distributions over them and proposes a principled method which utilizes the variance of Dirichlet distributions for determining when to learn and relearn the model.



A Bayesian Framework for Reinforcement Learning

It is proposed that the learning process estimates online the full posterior distribution over models and to determine behavior, a hypothesis is sampled from this distribution and the greedy policy with respect to the hypothesis is obtained by dynamic programming.

An analytic solution to discrete Bayesian reinforcement learning

This work proposes a new algorithm, called BEETLE, for effective online learning that is computationally efficient while minimizing the amount of exploration, and takes a Bayesian model-based approach, framing RL as a partially observable Markov decision process.

Model based Bayesian Exploration

This paper explicitly represents uncertainty about the parameters of the model and build probability distributions over Q-values based on these that are used to compute a myopic approximation to the value of information for each action and hence to select the action that best balances exploration and exploitation.

Active Learning in Partially Observable Markov Decision Processes

Results show good performance of the algorithm even in large problems: the most useful parameters of the model are learned quickly and the agent still accumulates high reward throughout the process.

Bayesian sparse sampling for on-line reward optimization

The idea is to grow a sparse lookahead tree, intelligently, by exploiting information in a Bayesian posterior---rather than enumerate action branches (standard sparse sampling) or compensate myopically (value of perfect information).

Point-Based Value Iteration for Continuous POMDPs

It is demonstrated that the value function for continuous POMDPs is convex in the beliefs over continuous state spaces, and piecewise-linear convex for the particular case of discrete observations and actions but still continuous states.

Using Linear Programming for Bayesian Exploration in Markov Decision Processes

Ideas for making this model of the environment as a Markov Decision Process computationally tractable are explored and a sample finite-length trajectories from the infinite tree are sample using ideas based on sparse sampling.

Learning in Graphical Models

This paper presents an introduction to inference for Bayesian networks and a view of the EM algorithm that justifies incremental, sparse and other variants, as well as an information-theoretic analysis of hard and soft assignment methods for clustering.

Scalable Internal-State Policy-Gradient Methods for POMDPs

Several improved algorithms for learning policies with memory in an infinite-horizon setting are developed — directly when a known model of the environment is available, and via simulation otherwise.

Scaling Internal-State Policy-Gradient Methods for POMDPs

Several improved algorithms for learning policies with memory in an infinite-horizon setting are developed — directly when a known model of the environment is available, and via simulation otherwise.