The Value Equivalence Principle for Model-Based Reinforcement Learning
@article{Grimm2020TheVE, title={The Value Equivalence Principle for Model-Based Reinforcement Learning}, author={Christopher Grimm and Andr{\'e} Barreto and Satinder Singh and David Silver}, journal={ArXiv}, year={2020}, volume={abs/2011.03506} }
Learning models of the environment from data is often viewed as an essential component to building intelligent reinforcement learning (RL) agents. The common practice is to separate the learning of the model from its use, by constructing a model of the environment's dynamics that correctly predicts the observed state transitions. In this paper we argue that the limited representational resources of model-based RL agents are better used to build models that are directly useful for value-based…
Figures and Tables from this paper
19 Citations
Proper Value Equivalence
- Computer ScienceNeurIPS
- 2021
A loss function is constructed for learning PVE models and it is argued that popular algorithms such as MuZero can be understood as minimizing an upper bound for this loss.
Model-Advantage Optimization for Model-Based Reinforcement Learning
- Computer ScienceArXiv
- 2021
This work proposes a novel value-aware objective that is an upper bound on the absolute performance difference of a policy across two models and proposes a general purpose algorithm that modifies the standard MBRL pipeline – enabling learning with value aware objectives.
Self-Consistent Models and Values
- Computer ScienceNeurIPS
- 2021
This work investigates a way of augmenting model-based RL, by additionally encouraging a learned model and value function to be jointly self-consistent, and finds that, with appropriate choices, self- Consistency helps both policy evaluation and control.
Value Gradient weighted Model-Based Reinforcement Learning
- Computer ScienceArXiv
- 2022
The Value-Gradient weighted Model loss (VaGraM) is proposed, a novel method for value-aware model learning which improves the performance of MBRL in challenging settings, such as small model capacity and the presence of distracting state dimensions.
Model-Advantage and Value-Aware Models for Model-Based Reinforcement Learning: Bridging the Gap in Theory and Practice
- Computer Science
- 2021
This work identifies the issue of stale value estimates in naively substituting value-aware objectives in place of maximum-likelihood in dyna-style model-based RL algorithms and proposes a proposed remedy that bridges the long-standing gap in theory and practice ofvalue-aware model learning.
Model-Value Inconsistency as a Signal for Epistemic Uncertainty
- Computer Science, EconomicsArXiv
- 2021
This work provides empirical evidence in both tabular and function approximation settings from pixels that self-inconsistency is useful as a signal for exploration, for acting safely under distribution shifts, and for robustifying value-based planning with a learned model.
Procedural Generalization by Planning with Self-Supervised World Models
- Computer ScienceArXiv
- 2021
Overall, this work suggests that building generalizable agents requires moving beyond the single-task, model-free paradigm and towards self-supervised model-based agents that are trained in rich, procedural, multi-task environments.
High-Accuracy Model-Based Reinforcement Learning, a Survey
- Computer ScienceArXiv
- 2021
This paper surveys model-based reinforcement learning methods, explaining in detail how they work and what their strengths and weaknesses are, and concludes with a research agenda for future work to make the methods more robust and more widely applicable to other applications.
Control-Oriented Model-Based Reinforcement Learning with Implicit Differentiation
- Computer ScienceArXiv
- 2021
This work proposes an end-to-end approach for model learning which directly optimizes the expected returns using implicit differentiation and provides theoretical and empirical evidence highlighting the benefits of this approach in the model misspecification regime compared to likelihoodbased methods.
Visualizing MuZero Models
- Computer ScienceArXiv
- 2021
This paper visualizes the latent representation of MuZero agents and finds that action trajectories may diverge between observation embeddings and internal state transition dynamics, which could lead to instability during planning.
References
SHOWING 1-10 OF 51 REFERENCES
SOLAR: Deep Structured Representations for Model-Based Reinforcement Learning
- Computer ScienceICML
- 2019
This paper presents a method for learning representations that are suitable for iterative model-based policy improvement, even when the underlying dynamical system has complex dynamics and image observations, in that these representations are optimized for inferring simple dynamics and cost models given data from the current policy.
Model-Based Reinforcement Learning with Value-Targeted Regression
- Computer ScienceL4DC
- 2020
This paper proposes a model based RL algorithm that is based on optimism principle, and derives a bound on the regret, which is independent of the total number of states or actions, and is close to a lower bound $\Omega(\sqrt{HdT})$.
TreeQN and ATreeC: Differentiable Tree-Structured Models for Deep Reinforcement Learning
- Computer ScienceICLR
- 2018
TreeQN, a differentiable, recursive, tree-structured model that serves as a drop-in replacement for any value function network in deep RL with discrete actions, and ATreeC, an actor-critic variant that augments TreeQN with a softmax layer to form a stochastic policy network.
Reinforcement learning with misspecified model classes
- Computer Science2013 IEEE International Conference on Robotics and Automation
- 2013
An algorithm is presented for which the highest performing model from the model class is guaranteed to be found given unlimited data and computation, by explicitly selecting the model which achieves the highest expected reward, rather than the most likely model.
Iterative Value-Aware Model Learning
- Computer ScienceNeurIPS
- 2018
A new model-based reinforcement learning (MBRL) framework that incorporates the underlying decision problem in learning the transition model of the environment, called Iterative VAML, that benefits from the structure of how the planning is performed (i.e., through approximate value iteration) to devise a simpler optimization problem.
Algorithms for Reinforcement Learning
- Computer ScienceAlgorithms for Reinforcement Learning
- 2010
This book focuses on those algorithms of reinforcement learning that build on the powerful theory of dynamic programming, and gives a fairly comprehensive catalog of learning problems, and describes the core ideas, followed by the discussion of their theoretical properties and limitations.
Value-Aware Loss Function for Model-based Reinforcement Learning
- Computer ScienceAISTATS
- 2017
This work argues that estimating a generative model that minimizes a probabilistic loss, such as the log-loss, is an overkill because it does not take into account the underlying structure of decision problem and the RL algorithm that intends to solve it.
Dyna-Style Planning with Linear Function Approximation and Prioritized Sweeping
- MathematicsUAI
- 2008
This paper develops an explicitly model-based approach extending the Dyna architecture to linear function approximation, to prove that linear Dyna-style planning converges to a unique solution independent of the generating distribution, under natural conditions.
Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning
- Computer ScienceArtif. Intell.
- 1999
Value Prediction Network
- Computer ScienceNIPS
- 2017
This paper proposes a novel deep reinforcement learning architecture, called Value Prediction Network (VPN), which integrates model-free and model-based RL methods into a single neural network, which outperforms Deep Q-Network on several Atari games even with short-lookahead planning.