Corpus ID: 17395633

Model-based Bayesian Reinforcement Learning in Partially Observable Domains

@inproceedings{Poupart2008ModelbasedBR,
  title={Model-based Bayesian Reinforcement Learning in Partially Observable Domains},
  author={Pascal Poupart and Nikos A. Vlassis},
  booktitle={ISAIM},
  year={2008}
}
Bayesian reinforcement learning in partially observable domains is notoriously difficult, in part due to the unknown form of the beliefs and the optimal value function. We show that beliefs represented by mixtures of products of Dirichlet distributions are closed under belief updates for factored domains. Belief monitoring algorithms that use this mixture representation are proposed. We also show that the optimal value function is a linear combination of products of Dirichlets for factored… Expand
Bayesian Reinforcement Learning
  • P. Poupart
  • Computer Science
  • Encyclopedia of Machine Learning
  • 2010
TLDR
This chapter surveys recent lines of work that use Bayesian techniques for reinforcement learning by explicitly maintaining a distribution over various quantities such as the parameters of the model, the value function, the policy or its gradient. Expand
Bayesian Reinforcement Learning
TLDR
This chapter surveys recent lines of work that use Bayesian techniques for reinforcement learning by explicitly maintaining a distribution over various quantities such as the parameters of the model, the value function, the policy or its gradient. Expand
Acting and Bayesian Reinforcement Structure Learning of Partially Observable Environment
This article shows how to learn both the structure and the parameters of partially observable en- vironment simultaneously while also online performing near-optimal sequence of actions taking intoExpand
Monte Carlo Bayesian Reinforcement Learning
Bayesian reinforcement learning (BRL) encodes prior knowledge of the world in a model and represents uncertainty in model parameters by maintaining a probability distribution over them. This paperExpand
Nonparametric Bayesian Policy Priors for Reinforcement Learning
TLDR
This work considers reinforcement learning in partially observable domains where the agent can query an expert for demonstrations and introduces priors that bias the agent towards models with both simple representations and simple policies, resulting in improved policy and model learning. Expand
Reinforcement learning with limited reinforcement: using Bayes risk for active learning in POMDPs
TLDR
This paper presents an approximation approach that allows us to treat the POMDP model parameters as additional hidden state in a "model-uncertainty" PomDP. Expand
A Bayesian Approach for Learning and Planning in Partially Observable Markov Decision Processes
TLDR
This paper introduces the Bayes-Adaptive Partially Observable Markov Decision Processes, a new framework that can be used to simultaneously learn a model of the POMDP domain through interaction with the environment, and track the state of the system under partial observability. Expand
Model-based Bayesian Reinforcement Learning in Factored Markov Decision Process
TLDR
The proposed model-based factored Bayesian reinforcement learning (F-BRL) approach can effectively reduce the number of learning parameters, and enable online learning for dynamic systems with thousands of states. Expand
Monte-Carlo Bayesian Reinforcement Learning Using a Compact Factored Representation
  • Bo Wu, Yan-Peng Feng
  • Computer Science
  • 2017 4th International Conference on Information Science and Control Engineering (ICISCE)
  • 2017
TLDR
This paper proposes a novel Monte Carlo tree search for Bayesian reinforcement learning approach using a compact factored representation, to solve the Bayesian reinforcing learning problem online. Expand
MCTS on model-based Bayesian Reinforcement Learning for efficient learning in Partially Observable environments
TLDR
This work focuses on solving partially observable domains, typically modeled as Partially Observable Markov Decision Processes (POMDPs), which are well-known to be hard to solve due to uncertainty as a result of stochastic transitions, partial observability, and unknown dynamics. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 20 REFERENCES
A Bayesian Framework for Reinforcement Learning
TLDR
It is proposed that the learning process estimates online the full posterior distribution over models and to determine behavior, a hypothesis is sampled from this distribution and the greedy policy with respect to the hypothesis is obtained by dynamic programming. Expand
An analytic solution to discrete Bayesian reinforcement learning
TLDR
This work proposes a new algorithm, called BEETLE, for effective online learning that is computationally efficient while minimizing the amount of exploration, and takes a Bayesian model-based approach, framing RL as a partially observable Markov decision process. Expand
Model based Bayesian Exploration
TLDR
This paper explicitly represents uncertainty about the parameters of the model and build probability distributions over Q-values based on these that are used to compute a myopic approximation to the value of information for each action and hence to select the action that best balances exploration and exploitation. Expand
Active Learning in Partially Observable Markov Decision Processes
TLDR
Results show good performance of the algorithm even in large problems: the most useful parameters of the model are learned quickly and the agent still accumulates high reward throughout the process. Expand
Bayesian sparse sampling for on-line reward optimization
TLDR
The idea is to grow a sparse lookahead tree, intelligently, by exploiting information in a Bayesian posterior---rather than enumerate action branches (standard sparse sampling) or compensate myopically (value of perfect information). Expand
Point-Based Value Iteration for Continuous POMDPs
TLDR
It is demonstrated that the value function for continuous POMDPs is convex in the beliefs over continuous state spaces, and piecewise-linear convex for the particular case of discrete observations and actions but still continuous states. Expand
Using Linear Programming for Bayesian Exploration in Markov Decision Processes
TLDR
Ideas for making this model of the environment as a Markov Decision Process computationally tractable are explored and a sample finite-length trajectories from the infinite tree are sample using ideas based on sparse sampling. Expand
Learning in Graphical Models
TLDR
This paper presents an introduction to inference for Bayesian networks and a view of the EM algorithm that justifies incremental, sparse and other variants, as well as an information-theoretic analysis of hard and soft assignment methods for clustering. Expand
Scalable Internal-State Policy-Gradient Methods for POMDPs
TLDR
Several improved algorithms for learning policies with memory in an infinite-horizon setting are developed — directly when a known model of the environment is available, and via simulation otherwise. Expand
Scaling Internal-State Policy-Gradient Methods for POMDPs
Policy-gradient methods have received increased attention recently as a mechanism for learning to act in partially observable environments. They have shown promise for problems admitting memorylessExpand
...
1
2
...