• Corpus ID: 237635098

Regularization Guarantees Generalization in Bayesian Reinforcement Learning through Algorithmic Stability

@article{Tamar2021RegularizationGG,
  title={Regularization Guarantees Generalization in Bayesian Reinforcement Learning through Algorithmic Stability},
  author={Aviv Tamar and Daniel Soudry and Ev Zisselman},
  journal={ArXiv},
  year={2021},
  volume={abs/2109.11792}
}
In the Bayesian reinforcement learning (RL) setting, a prior distribution over the unknown problem parameters – the rewards and transitions – is assumed, and a policy that optimizes the (posterior) expected return is sought. A common approximation, which has been recently popularized as metaRL, is to train the agent on a sample of N problem instances from the prior, with the hope that for large enough N , good generalization behavior to an unseen test instance will be obtained. In this work, we… 
1 Citations
Meta Reinforcement Learning with Finite Training Tasks - a Density Estimation Approach
TLDR
This work directly learn the task distribution, using density estimation techniques, and then train a policy on the learned task distribution and demonstrates that this regularization implied by the kernel density estimation method is useful in practice, when ‘plugged in’ the state-of-the-art VariBAD meta RL algorithm.

References

SHOWING 1-10 OF 39 REFERENCES
A unified view of entropy-regularized Markov decision processes
TLDR
A general framework for entropy-regularized average-reward reinforcement learning in Markov decision processes (MDPs) is proposed, showing that using the conditional entropy of the joint state-action distributions as regularization yields a dual optimization problem closely resembling the Bellman optimality equations.
Differentiable Meta-Learning of Bandit Policies
TLDR
This work parameterize policies in a differentiable way and optimize them by policy gradients, an approach that is pleasantly general and easy to implement, and observes that neural network policies can learn implicit biases expressed only through the sampled instances.
Taming the Noise in Reinforcement Learning via Soft Updates
TLDR
G-learning is proposed, a new off-policy learning algorithm that regularizes the noise in the space of optimal actions by penalizing deterministic policies at the beginning of the learning, which enables naturally incorporating prior distributions over optimal actions when available.
Offline Meta Learning of Exploration
TLDR
This work extends the recently proposed VariBAD BRL algorithm to the off-policy setting, and demonstrates learning of approximately Bayes-optimal exploration strategies from offline data using deep neural networks.
Bayesian Reinforcement Learning via Deep, Sparse Sampling
TLDR
An optimism-free Bayes-adaptive algorithm to induce deeper and sparser exploration with a theoretical bound on its performance relative to the Bayes optimal policy, with a lower computational complexity is proposed.
Near-optimal Regret Bounds for Reinforcement Learning
TLDR
This work presents a reinforcement learning algorithm with total regret O(DS√AT) after T steps for any unknown MDP with S states, A actions per state, and diameter D, and proposes a new parameter: An MDP has diameter D if for any pair of states s,s' there is a policy which moves from s to s' in at most D steps.
Bayesian Reinforcement Learning: A Survey
TLDR
An in-depth review of the role of Bayesian methods for the reinforcement learning (RL) paradigm, and a comprehensive survey on Bayesian RL algorithms and their theoretical and empirical properties.
PAC-BUS: Meta-Learning Bounds via PAC-Bayes and Uniform Stability
TLDR
A probably approximately correct (PAC) bound is derived for gradient-based meta-learning using two different generalization frameworks in order to deal with the qualitatively different challenges of generalization at the “base” and “meta” levels.
VariBAD: A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning
TLDR
This paper introduces variational Bayes-Adaptive Deep RL (variBAD), a way to meta-learn to perform approximate inference in an unknown environment, and incorporate task uncertainty directly during action selection and achieves higher online return than existing methods.
Stability and Generalization of Learning Algorithms that Converge to Global Optima
TLDR
This work derives black-box stability results that only depend on the convergence of a learning algorithm and the geometry around the minimizers of the loss function that establish novel generalization bounds for learning algorithms that converge to global minima.
...
...