• Corpus ID: 232380390

Bellman: A Toolbox for Model-Based Reinforcement Learning in TensorFlow

@article{Mcleod2021BellmanAT,
  title={Bellman: A Toolbox for Model-Based Reinforcement Learning in TensorFlow},
  author={John Mcleod and Hrvoje Stoji{\'c} and Vincent Adam and Dongho Kim and Jordi Grau-Moya and Peter Vrancx and Felix Leibfried},
  journal={ArXiv},
  year={2021},
  volume={abs/2103.14407}
}
In the past decade, model-free reinforcement learning (RL) has provided solutions to challenging domains such as robotics. Model-based RL (where agents learn a model of the environment in order to explicitly plan ahead) shows the prospect of being more sampleefficient than model-free methods in terms of agent-environment interactions, because the model enables to extrapolate to unseen situations. In the more recent past, model-based methods have shown superior results compared to model-free… 

Figures from this paper

MBRL-Lib: A Modular Library for Model-based Reinforcement Learning
TLDR
MBRL-Lib is designed as a platform for both researchers, to easily develop, debug and compare new algorithms, and non-expert user, to lower the entry-bar of deploying state-of-the-art algorithms.

References

SHOWING 1-10 OF 43 REFERENCES
Model-Ensemble Trust-Region Policy Optimization
TLDR
This paper analyzes the behavior of vanilla model-based reinforcement learning methods when deep neural networks are used to learn both the model and the policy, and shows that the learned policy tends to exploit regions where insufficient data is available for the model to be learned, causing instability in training.
Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models
TLDR
This paper proposes a new algorithm called probabilistic ensembles with trajectory sampling (PETS) that combines uncertainty-aware deep network dynamics models with sampling-based uncertainty propagation, which matches the asymptotic performance of model-free algorithms on several challenging benchmark tasks, while requiring significantly fewer samples.
Sample-Efficient Reinforcement Learning with Stochastic Ensemble Value Expansion
TLDR
Stochastic ensemble value expansion (STEVE), a novel model-based technique that addresses this issue by dynamically interpolating between model rollouts of various horizon lengths for each individual example, outperforms model-free baselines on challenging continuous control benchmarks with an order-of-magnitude increase in sample efficiency.
Neural Network Dynamics for Model-Based Deep Reinforcement Learning with Model-Free Fine-Tuning
TLDR
It is demonstrated that neural network dynamics models can in fact be combined with model predictive control (MPC) to achieve excellent sample complexity in a model-based reinforcement learning algorithm, producing stable and plausible gaits that accomplish various complex locomotion tasks.
Soft Actor-Critic Algorithms and Applications
TLDR
Soft Actor-Critic (SAC), the recently introduced off-policy actor-critic algorithm based on the maximum entropy RL framework, achieves state-of-the-art performance, outperforming prior on-policy and off- policy methods in sample-efficiency and asymptotic performance.
Data-Efficient Reinforcement Learning with Probabilistic Model Predictive Control
TLDR
This work proposes a model-based RL framework based on probabilistic Model Predictive Control based on Gaussian Processes to incorporate model uncertainty into long-term predictions, thereby, reducing the impact of model errors and provides theoretical guarantees for first-order optimality in the GP-based transition models with deterministic approximate inference for long- term planning.
Exploring Model-based Planning with Policy Networks
TLDR
This paper proposes a novel MBRL algorithm, model-based policy planning (POPLIN), that combines policy networks with online planning and shows that POPLIN obtains state-of-the-art performance in the MuJoCo benchmarking environments, being about 3x more sample efficient than the state of theart algorithms, such as PETS, TD3 and SAC.
Challenges of Real-World Reinforcement Learning
TLDR
A set of nine unique challenges that must be addressed to productionize RL to real world problems are presented and an example domain that has been modified to present these challenges as a testbed for practical RL research is presented.
Benchmarking Model-Based Reinforcement Learning
TLDR
This paper gathers a wide collection of MBRL algorithms and proposes over 18 benchmarking environments specially designed for MBRL, and describes three key research challenges for future MBRL research: the dynamics bottleneck, the planning horizon dilemma, and the early-termination dilemma.
Baconian: A Unified Open-source Framework for Model-Based Reinforcement Learning
TLDR
This work develops a flexible and modularized framework, Baconian, which allows researchers to easily implement a MBRL testbed by customizing or building upon the provided modules and algorithms.
...
...