• Corpus ID: 244129955

On Effective Scheduling of Model-based Reinforcement Learning

@inproceedings{Lai2021OnES,
  title={On Effective Scheduling of Model-based Reinforcement Learning},
  author={Hang Lai and Jian Shen and Weinan Zhang and Yimin Huang and Xingzhi Zhang and Ruiming Tang and Yong Yu and Zhenguo Li},
  booktitle={NeurIPS},
  year={2021}
}
Model-based reinforcement learning has attracted wide attention due to its superior sample efficiency. Despite its impressive success so far, it is still unclear how to appropriately schedule the important hyperparameters to achieve adequate performance, such as the real data ratio for policy optimization in Dyna-style model-based algorithms. In this paper, we first theoretically analyze the role of real data in policy training, which suggests that gradually increasing the ratio of real data… 
1 Citations
A Survey on Model-based Reinforcement Learning
TLDR
This survey takes a review of model-based reinforcement learning (MBRL) with a focus on the recent progress in deep RL, and discusses the applicability and advantages of MBRL in real-world tasks.

References

SHOWING 1-10 OF 46 REFERENCES
On the Importance of Hyperparameter Optimization for Model-based Reinforcement Learning
TLDR
This work demonstrates that this problem can be tackled effectively with automated HPO, and shows that tuning of several MBRL hyperparameter dynamically, i.e. during the training itself, further improves the performance compared to using static hyperparameters which are kept static for the whole training.
Intelligent Trainer for Dyna-Style Model-Based Deep Reinforcement Learning
Model-based reinforcement learning (MBRL) has been proposed as a promising alternative solution to tackle the high sampling cost challenge in the canonical RL, by leveraging a system dynamics model
Model-Based Reinforcement Learning via Meta-Policy Optimization
TLDR
This work proposes Model-Based Meta-Policy-Optimization (MB-MPO), an approach that foregoes the strong reliance on accurate learned dynamics models and uses an ensemble of learned dynamic models to create a policy that can quickly adapt to any model in the ensemble with one policy gradient step.
Model-Ensemble Trust-Region Policy Optimization
TLDR
This paper analyzes the behavior of vanilla model-based reinforcement learning methods when deep neural networks are used to learn both the model and the policy, and shows that the learned policy tends to exploit regions where insufficient data is available for the model to be learned, causing instability in training.
Benchmarking Model-Based Reinforcement Learning
TLDR
This paper gathers a wide collection of MBRL algorithms and proposes over 18 benchmarking environments specially designed for MBRL, and describes three key research challenges for future MBRL research: the dynamics bottleneck, the planning horizon dilemma, and the early-termination dilemma.
Model-based Policy Optimization with Unsupervised Model Adaptation
TLDR
A novel model-based reinforcement learning framework AMPO is proposed, which introduces unsupervised model adaptation to minimize the integral probability metric (IPM) between feature distributions from real and simulated data.
Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models
TLDR
This paper proposes a new algorithm called probabilistic ensembles with trajectory sampling (PETS) that combines uncertainty-aware deep network dynamics models with sampling-based uncertainty propagation, which matches the asymptotic performance of model-free algorithms on several challenging benchmark tasks, while requiring significantly fewer samples.
Bidirectional Model-based Policy Optimization
TLDR
This paper proposes to additionally construct a backward dynamics model to reduce the reliance on accuracy in forward model predictions, and develops a novel method, called Bidirectional Model-based Policy Optimization (BMPO), to utilize both the forward model and backward model to generate short branched rollouts for policy optimization.
Sample-Efficient Reinforcement Learning with Stochastic Ensemble Value Expansion
TLDR
Stochastic ensemble value expansion (STEVE), a novel model-based technique that addresses this issue by dynamically interpolating between model rollouts of various horizon lengths for each individual example, outperforms model-free baselines on challenging continuous control benchmarks with an order-of-magnitude increase in sample efficiency.
Algorithmic Framework for Model-based Reinforcement Learning with Theoretical Guarantees
TLDR
A novel algorithmic framework for designing and analyzing model-based RL algorithms with theoretical guarantees is introduced and a meta-algorithm with a theoretical guarantee of monotone improvement to a local maximum of the expected reward is designed.
...
...