Bellman: A Toolbox for Model-Based Reinforcement Learning in TensorFlow
@article{Mcleod2021BellmanAT, title={Bellman: A Toolbox for Model-Based Reinforcement Learning in TensorFlow}, author={John Mcleod and Hrvoje Stoji{\'c} and Vincent Adam and Dongho Kim and Jordi Grau-Moya and Peter Vrancx and Felix Leibfried}, journal={ArXiv}, year={2021}, volume={abs/2103.14407} }
In the past decade, model-free reinforcement learning (RL) has provided solutions to challenging domains such as robotics. Model-based RL (where agents learn a model of the environment in order to explicitly plan ahead) shows the prospect of being more sampleefficient than model-free methods in terms of agent-environment interactions, because the model enables to extrapolate to unseen situations. In the more recent past, model-based methods have shown superior results compared to model-free…
Figures from this paper
One Citation
MBRL-Lib: A Modular Library for Model-based Reinforcement Learning
- Computer ScienceArXiv
- 2021
MBRL-Lib is designed as a platform for both researchers, to easily develop, debug and compare new algorithms, and non-expert user, to lower the entry-bar of deploying state-of-the-art algorithms.
References
SHOWING 1-10 OF 43 REFERENCES
Model-Ensemble Trust-Region Policy Optimization
- Computer ScienceICLR
- 2018
This paper analyzes the behavior of vanilla model-based reinforcement learning methods when deep neural networks are used to learn both the model and the policy, and shows that the learned policy tends to exploit regions where insufficient data is available for the model to be learned, causing instability in training.
Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models
- Computer ScienceNeurIPS
- 2018
This paper proposes a new algorithm called probabilistic ensembles with trajectory sampling (PETS) that combines uncertainty-aware deep network dynamics models with sampling-based uncertainty propagation, which matches the asymptotic performance of model-free algorithms on several challenging benchmark tasks, while requiring significantly fewer samples.
Sample-Efficient Reinforcement Learning with Stochastic Ensemble Value Expansion
- Computer ScienceNeurIPS
- 2018
Stochastic ensemble value expansion (STEVE), a novel model-based technique that addresses this issue by dynamically interpolating between model rollouts of various horizon lengths for each individual example, outperforms model-free baselines on challenging continuous control benchmarks with an order-of-magnitude increase in sample efficiency.
Neural Network Dynamics for Model-Based Deep Reinforcement Learning with Model-Free Fine-Tuning
- Computer Science2018 IEEE International Conference on Robotics and Automation (ICRA)
- 2018
It is demonstrated that neural network dynamics models can in fact be combined with model predictive control (MPC) to achieve excellent sample complexity in a model-based reinforcement learning algorithm, producing stable and plausible gaits that accomplish various complex locomotion tasks.
Soft Actor-Critic Algorithms and Applications
- Computer ScienceArXiv
- 2018
Soft Actor-Critic (SAC), the recently introduced off-policy actor-critic algorithm based on the maximum entropy RL framework, achieves state-of-the-art performance, outperforming prior on-policy and off- policy methods in sample-efficiency and asymptotic performance.
Data-Efficient Reinforcement Learning with Probabilistic Model Predictive Control
- Computer ScienceAISTATS
- 2018
This work proposes a model-based RL framework based on probabilistic Model Predictive Control based on Gaussian Processes to incorporate model uncertainty into long-term predictions, thereby, reducing the impact of model errors and provides theoretical guarantees for first-order optimality in the GP-based transition models with deterministic approximate inference for long- term planning.
Exploring Model-based Planning with Policy Networks
- Computer ScienceICLR
- 2020
This paper proposes a novel MBRL algorithm, model-based policy planning (POPLIN), that combines policy networks with online planning and shows that POPLIN obtains state-of-the-art performance in the MuJoCo benchmarking environments, being about 3x more sample efficient than the state of theart algorithms, such as PETS, TD3 and SAC.
Challenges of Real-World Reinforcement Learning
- Computer ScienceArXiv
- 2019
A set of nine unique challenges that must be addressed to productionize RL to real world problems are presented and an example domain that has been modified to present these challenges as a testbed for practical RL research is presented.
Benchmarking Model-Based Reinforcement Learning
- Computer ScienceArXiv
- 2019
This paper gathers a wide collection of MBRL algorithms and proposes over 18 benchmarking environments specially designed for MBRL, and describes three key research challenges for future MBRL research: the dynamics bottleneck, the planning horizon dilemma, and the early-termination dilemma.
Baconian: A Unified Open-source Framework for Model-Based Reinforcement Learning
- Computer Science
- 2019
This work develops a flexible and modularized framework, Baconian, which allows researchers to easily implement a MBRL testbed by customizing or building upon the provided modules and algorithms.