• Corpus ID: 244129955

On Effective Scheduling of Model-based Reinforcement Learning

  title={On Effective Scheduling of Model-based Reinforcement Learning},
  author={Hang Lai and Jian Shen and Weinan Zhang and Yimin Huang and Xingzhi Zhang and Ruiming Tang and Yong Yu and Zhenguo Li},
Model-based reinforcement learning has attracted wide attention due to its superior sample efficiency. Despite its impressive success so far, it is still unclear how to appropriately schedule the important hyperparameters to achieve adequate performance, such as the real data ratio for policy optimization in Dyna-style model-based algorithms. In this paper, we first theoretically analyze the role of real data in policy training, which suggests that gradually increasing the ratio of real data… 


On the Importance of Hyperparameter Optimization for Model-based Reinforcement Learning
This work demonstrates that this problem can be tackled effectively with automated HPO, and shows that tuning of several MBRL hyperparameter dynamically, i.e. during the training itself, further improves the performance compared to using static hyperparameters which are kept static for the whole training.
Intelligent Trainer for Dyna-Style Model-Based Deep Reinforcement Learning
Model-based reinforcement learning (MBRL) has been proposed as a promising alternative solution to tackle the high sampling cost challenge in the canonical RL, by leveraging a system dynamics model
Model-Based Reinforcement Learning via Meta-Policy Optimization
This work proposes Model-Based Meta-Policy-Optimization (MB-MPO), an approach that foregoes the strong reliance on accurate learned dynamics models and uses an ensemble of learned dynamic models to create a policy that can quickly adapt to any model in the ensemble with one policy gradient step.
Model-Ensemble Trust-Region Policy Optimization
This paper analyzes the behavior of vanilla model-based reinforcement learning methods when deep neural networks are used to learn both the model and the policy, and shows that the learned policy tends to exploit regions where insufficient data is available for the model to be learned, causing instability in training.
Benchmarking Model-Based Reinforcement Learning
This paper gathers a wide collection of MBRL algorithms and proposes over 18 benchmarking environments specially designed for MBRL, and describes three key research challenges for future MBRL research: the dynamics bottleneck, the planning horizon dilemma, and the early-termination dilemma.
Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models
This paper proposes a new algorithm called probabilistic ensembles with trajectory sampling (PETS) that combines uncertainty-aware deep network dynamics models with sampling-based uncertainty propagation, which matches the asymptotic performance of model-free algorithms on several challenging benchmark tasks, while requiring significantly fewer samples.
Sample-Efficient Reinforcement Learning with Stochastic Ensemble Value Expansion
Stochastic ensemble value expansion (STEVE), a novel model-based technique that addresses this issue by dynamically interpolating between model rollouts of various horizon lengths for each individual example, outperforms model-free baselines on challenging continuous control benchmarks with an order-of-magnitude increase in sample efficiency.
Algorithmic Framework for Model-based Reinforcement Learning with Theoretical Guarantees
A novel algorithmic framework for designing and analyzing model-based RL algorithms with theoretical guarantees is introduced and a meta-algorithm with a theoretical guarantee of monotone improvement to a local maximum of the expected reward is designed.
When to Trust Your Model: Model-Based Policy Optimization
This paper first formulate and analyze a model-based reinforcement learning algorithm with a guarantee of monotonic improvement at each step, and demonstrates that a simple procedure of using short model-generated rollouts branched from real data has the benefits of more complicated model- based algorithms without the usual pitfalls.
Model-Augmented Actor-Critic: Backpropagating through Paths
This paper builds a policy optimization algorithm that uses the pathwise derivative of the learned model and policy across future timesteps, and matches the asymptotic performance of model-free algorithms, and scales to long horizons, a regime where typically past model-based approaches have struggled.