• Corpus ID: 233296008

Planning with Expectation Models for Control

  title={Planning with Expectation Models for Control},
  author={Katya Kudashkina and Yi Wan and Abhishek Naik and Richard S. Sutton},
In model-based reinforcement learning (MBRL), Wan et al. (2019) showed conditions under which the environment model could produce the expectation of the next feature vector rather than the full distribution, or a sample thereof, with no loss in planning performance. Such expectation models are of interest when the environment is stochastic and non-stationary, and the model is approximate, such as when it is learned using function approximation. In these cases a full distribution model may be… 

Figures from this paper



Reinforcement Learning with a Hierarchy of Abstract Models

Simulations on a set of compositionally-structured navigation tasks show that H-DYNA can learn to solve them faster than conventional RL algorithms, and the abstract models can be used to solve stochastic control tasks.

Hill Climbing on Value Estimates for Search-control in Dyna

This work proposes to generate states by using the trajectory obtained from Hill Climbing the current estimate of the value function, and finds that there appears to be a benefit specifically from using the samples generated by climbing on current value estimates from low-value to high-value region.

Learning and Using Models

This chapter surveys some of the types of models used in model-based methods and ways of learning them, as well as methods for planning on these models, and examines the sample efficiency of a few methods, which are highly dependent on having intelligent exploration mechanisms.

Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models

This paper proposes a new algorithm called probabilistic ensembles with trajectory sampling (PETS) that combines uncertainty-aware deep network dynamics models with sampling-based uncertainty propagation, which matches the asymptotic performance of model-free algorithms on several challenging benchmark tasks, while requiring significantly fewer samples.

Model-Based Reinforcement Learning with an Approximate, Learned Model

It is shown that model-based methods do indeed perform better than model-free reinforcement learning, and these experiments involve the Mountain Car task, which requires approximation of both value function and model.

Dyna-Style Planning with Linear Function Approximation and Prioritized Sweeping

This paper develops an explicitly model-based approach extending the Dyna architecture to linear function approximation, to prove that linear Dyna-style planning converges to a unique solution independent of the generating distribution, under natural conditions.

Organizing Experience: a Deeper Look at Replay Mechanisms for Sample-Based Planning in Continuous State Domains

The aim of this paper is to revisit sample-based planning, in stochastic and continuous domains with learned models, and introduces a semi-parametric model learning approach, called Reweighted Experience Models (REMs), that makes it simple to sample next states or predecessors.

Sample-Efficient Reinforcement Learning with Stochastic Ensemble Value Expansion

Stochastic ensemble value expansion (STEVE), a novel model-based technique that addresses this issue by dynamically interpolating between model rollouts of various horizon lengths for each individual example, outperforms model-free baselines on challenging continuous control benchmarks with an order-of-magnitude increase in sample efficiency.

Self-Correcting Models for Model-Based Reinforcement Learning

This paper theoretically analyzes Hallucinated Replay's approach, illuminates settings in which it is likely to be effective or ineffective, and presents a novel error bound, showing that a model's ability to self-correct is more tightly related to MBRL performance than one-step prediction error.

Model-Based Reinforcement Learning for Atari

Simulated Policy Learning (SimPLe), a complete model-based deep RL algorithm based on video prediction models, is described and a comparison of several model architectures is presented, including a novel architecture that yields the best results in the authors' setting.