• Corpus ID: 233296008

Planning with Expectation Models for Control

  title={Planning with Expectation Models for Control},
  author={Katya Kudashkina and Yi Wan and Abhishek Naik and Richard S. Sutton},
In model-based reinforcement learning (MBRL), Wan et al. (2019) showed conditions under which the environment model could produce the expectation of the next feature vector rather than the full distribution, or a sample thereof, with no loss in planning performance. Such expectation models are of interest when the environment is stochastic and non-stationary, and the model is approximate, such as when it is learned using function approximation. In these cases a full distribution model may be… 

Figures from this paper



Planning with Expectation Models

It is shown that planning with an expectation model is equivalent to planning with a distribution model if the state value function is linear in state features, and two common parametrization choices for approximating the expectation are analyzed.

Hill Climbing on Value Estimates for Search-control in Dyna

This work proposes to generate states by using the trajectory obtained from Hill Climbing the current estimate of the value function, and finds that there appears to be a benefit specifically from using the samples generated by climbing on current value estimates from low-value to high-value region.

Learning and Using Models

This chapter surveys some of the types of models used in model-based methods and ways of learning them, as well as methods for planning on these models, and examines the sample efficiency of a few methods, which are highly dependent on having intelligent exploration mechanisms.

The Effect of Planning Shape on Dyna-style Planning in High-dimensional State Spaces

This work finds that planning shape has a profound impact on the efficacy of Dyna for both perfect and learned models, suggesting that Dyna may be a viable approach to model-based reinforcement learning in the Arcade Learning Environment and other high-dimensional problems.

Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models

This paper proposes a new algorithm called probabilistic ensembles with trajectory sampling (PETS) that combines uncertainty-aware deep network dynamics models with sampling-based uncertainty propagation, which matches the asymptotic performance of model-free algorithms on several challenging benchmark tasks, while requiring significantly fewer samples.

Model-Based Reinforcement Learning with an Approximate, Learned Model

It is shown that model-based methods do indeed perform better than model-free reinforcement learning, and these experiments involve the Mountain Car task, which requires approximation of both value function and model.

Dyna-Style Planning with Linear Function Approximation and Prioritized Sweeping

This paper develops an explicitly model-based approach extending the Dyna architecture to linear function approximation, to prove that linear Dyna-style planning converges to a unique solution independent of the generating distribution, under natural conditions.

Organizing Experience: a Deeper Look at Replay Mechanisms for Sample-Based Planning in Continuous State Domains

The aim of this paper is to revisit sample-based planning, in stochastic and continuous domains with learned models, and introduces a semi-parametric model learning approach, called Reweighted Experience Models (REMs), that makes it simple to sample next states or predecessors.

Sample-Efficient Reinforcement Learning with Stochastic Ensemble Value Expansion

Stochastic ensemble value expansion (STEVE), a novel model-based technique that addresses this issue by dynamically interpolating between model rollouts of various horizon lengths for each individual example, outperforms model-free baselines on challenging continuous control benchmarks with an order-of-magnitude increase in sample efficiency.

Self-Correcting Models for Model-Based Reinforcement Learning

This paper theoretically analyzes Hallucinated Replay's approach, illuminates settings in which it is likely to be effective or ineffective, and presents a novel error bound, showing that a model's ability to self-correct is more tightly related to MBRL performance than one-step prediction error.