• Corpus ID: 226278317

# The Value Equivalence Principle for Model-Based Reinforcement Learning

@article{Grimm2020TheVE,
title={The Value Equivalence Principle for Model-Based Reinforcement Learning},
author={Christopher Grimm and Andr{\'e} Barreto and Satinder Singh and David Silver},
journal={ArXiv},
year={2020},
volume={abs/2011.03506}
}
• Published 6 November 2020
• Computer Science
• ArXiv
Learning models of the environment from data is often viewed as an essential component to building intelligent reinforcement learning (RL) agents. The common practice is to separate the learning of the model from its use, by constructing a model of the environment's dynamics that correctly predicts the observed state transitions. In this paper we argue that the limited representational resources of model-based RL agents are better used to build models that are directly useful for value-based…

## Figures and Tables from this paper

• Computer Science
NeurIPS
• 2021
A loss function is constructed for learning PVE models and it is argued that popular algorithms such as MuZero can be understood as minimizing an upper bound for this loss.
• Computer Science
ArXiv
• 2022
An algorithm is introduced that iteratively computes an approximately-value-equivalent, lossy compression of the environment which an agent may feasibly target in lieu of the true model, and an information-theoretic, Bayesian regret bound is proved for this algorithm that holds for anyinite-horizon, episodic sequential decision-making problem.
• Computer Science
ArXiv
• 2021
This work proposes a novel value-aware objective that is an upper bound on the absolute performance difference of a policy across two models and proposes a general purpose algorithm that modifies the standard MBRL pipeline – enabling learning with value aware objectives.
• Computer Science
NeurIPS
• 2021
This work investigates a way of augmenting model-based RL, by additionally encouraging a learned model and value function to be jointly self-consistent, and finds that, with appropriate choices, self- Consistency helps both policy evaluation and control.
• Computer Science
ICLR
• 2021
This paper studies the performance of MuZero, a state-of-the-art model-based reinforcement learning algorithm with strong connections and overlapping components with many other MBRL algorithms, and suggests that planning alone is insufficient to drive strong generalization.
• Computer Science
Found. Trends Mach. Learn.
• 2023
A survey of the integration of model-based reinforcement learning and planning, better known as model- based reinforcement learning, and a broad conceptual overview of planning-learning combinations for MDP optimization are presented.
• Computer Science
ICLR
• 2022
The Value-Gradient weighted Model loss (VaGraM) is proposed, a novel method for value-aware model learning which improves the performance of MBRL in challenging settings, such as small model capacity and the presence of distracting state dimensions.
• Computer Science
• 2021
This work identifies the issue of stale value estimates in naively substituting value-aware objectives in place of maximum-likelihood in dyna-style model-based RL algorithms and proposes a proposed remedy that bridges the long-standing gap in theory and practice ofvalue-aware model learning.
• Computer Science
ArXiv
• 2022
It is suggested that even though decision-time planning does not perform as well as background planning in their classical instantiations, in their modern instantiations it can perform on par or better than background plans in both the planning & learning and transfer learning settings.
• Computer Science, Economics
ICML
• 2022
This work provides empirical evidence in both tabular and function approximation settings from pixels that self-inconsistency is useful as a signal for exploration, for acting safely under distribution shifts, and for robustifying value-based planning with a learned model.

## References

SHOWING 1-10 OF 51 REFERENCES

• Computer Science
AAAI
• 2021
This paper argues that the value prediction problems faced by an RL agent should be addressed in isolation, but rather as a single, holistic, prediction problem, and demonstrates that a representation that spans the past value-improvement path will also provide an accurate value approximation for future policy improvements.
• Computer Science
ArXiv
• 2020
This paper examines how the planning module of an MBRL algorithm uses the model, and proposes that the model learning module should incorporate the way the planner is going to use the model.
• Computer Science
ICML
• 2019
This paper presents a method for learning representations that are suitable for iterative model-based policy improvement, even when the underlying dynamical system has complex dynamics and image observations, in that these representations are optimized for inferring simple dynamics and cost models given data from the current policy.
• Computer Science
L4DC
• 2020
This paper proposes a model based RL algorithm that is based on optimism principle, and derives a bound on the regret, which is independent of the total number of states or actions, and is close to a lower bound $\Omega(\sqrt{HdT})$.
• Computer Science
ICLR
• 2018
TreeQN, a differentiable, recursive, tree-structured model that serves as a drop-in replacement for any value function network in deep RL with discrete actions, and ATreeC, an actor-critic variant that augments TreeQN with a softmax layer to form a stochastic policy network.
• Computer Science
2013 IEEE International Conference on Robotics and Automation
• 2013
An algorithm is presented for which the highest performing model from the model class is guaranteed to be found given unlimited data and computation, by explicitly selecting the model which achieves the highest expected reward, rather than the most likely model.
A new model-based reinforcement learning (MBRL) framework that incorporates the underlying decision problem in learning the transition model of the environment, called Iterative VAML, that benefits from the structure of how the planning is performed (i.e., through approximate value iteration) to devise a simpler optimization problem.
This book focuses on those algorithms of reinforcement learning that build on the powerful theory of dynamic programming, and gives a fairly comprehensive catalog of learning problems, and describes the core ideas, followed by the discussion of their theoretical properties and limitations.
• Computer Science
AISTATS
• 2017
This work argues that estimating a generative model that minimizes a probabilistic loss, such as the log-loss, is an overkill because it does not take into account the underlying structure of decision problem and the RL algorithm that intends to solve it.