# The Value Equivalence Principle for Model-Based Reinforcement Learning

@article{Grimm2020TheVE, title={The Value Equivalence Principle for Model-Based Reinforcement Learning}, author={Christopher Grimm and Andr{\'e} Barreto and Satinder Singh and David Silver}, journal={ArXiv}, year={2020}, volume={abs/2011.03506} }

Learning models of the environment from data is often viewed as an essential component to building intelligent reinforcement learning (RL) agents. The common practice is to separate the learning of the model from its use, by constructing a model of the environment's dynamics that correctly predicts the observed state transitions. In this paper we argue that the limited representational resources of model-based RL agents are better used to build models that are directly useful for value-based…

## 39 Citations

### Proper Value Equivalence

- Computer ScienceNeurIPS
- 2021

A loss function is constructed for learning PVE models and it is argued that popular algorithms such as MuZero can be understood as minimizing an upper bound for this loss.

### Deciding What to Model: Value-Equivalent Sampling for Reinforcement Learning

- Computer ScienceArXiv
- 2022

An algorithm is introduced that iteratively computes an approximately-value-equivalent, lossy compression of the environment which an agent may feasibly target in lieu of the true model, and an information-theoretic, Bayesian regret bound is proved for this algorithm that holds for anyinite-horizon, episodic sequential decision-making problem.

### Model-Advantage Optimization for Model-Based Reinforcement Learning

- Computer ScienceArXiv
- 2021

This work proposes a novel value-aware objective that is an upper bound on the absolute performance difference of a policy across two models and proposes a general purpose algorithm that modifies the standard MBRL pipeline – enabling learning with value aware objectives.

### Self-Consistent Models and Values

- Computer ScienceNeurIPS
- 2021

This work investigates a way of augmenting model-based RL, by additionally encouraging a learned model and value function to be jointly self-consistent, and finds that, with appropriate choices, self- Consistency helps both policy evaluation and control.

### On the role of planning in model-based deep reinforcement learning

- Computer ScienceICLR
- 2021

This paper studies the performance of MuZero, a state-of-the-art model-based reinforcement learning algorithm with strong connections and overlapping components with many other MBRL algorithms, and suggests that planning alone is insufficient to drive strong generalization.

### Model-based Reinforcement Learning: A Survey

- Computer ScienceFound. Trends Mach. Learn.
- 2023

A survey of the integration of model-based reinforcement learning and planning, better known as model- based reinforcement learning, and a broad conceptual overview of planning-learning combinations for MDP optimization are presented.

### Value Gradient weighted Model-Based Reinforcement Learning

- Computer ScienceICLR
- 2022

The Value-Gradient weighted Model loss (VaGraM) is proposed, a novel method for value-aware model learning which improves the performance of MBRL in challenging settings, such as small model capacity and the presence of distracting state dimensions.

### Model-Advantage and Value-Aware Models for Model-Based Reinforcement Learning: Bridging the Gap in Theory and Practice

- Computer Science
- 2021

This work identifies the issue of stale value estimates in naively substituting value-aware objectives in place of maximum-likelihood in dyna-style model-based RL algorithms and proposes a proposed remedy that bridges the long-standing gap in theory and practice ofvalue-aware model learning.

### Understanding Decision-Time vs. Background Planning in Model-Based Reinforcement Learning

- Computer ScienceArXiv
- 2022

It is suggested that even though decision-time planning does not perform as well as background planning in their classical instantiations, in their modern instantiations it can perform on par or better than background plans in both the planning & learning and transfer learning settings.

### Model-Value Inconsistency as a Signal for Epistemic Uncertainty

- Computer Science, EconomicsICML
- 2022

This work provides empirical evidence in both tabular and function approximation settings from pixels that self-inconsistency is useful as a signal for exploration, for acting safely under distribution shifts, and for robustifying value-based planning with a learned model.

## References

SHOWING 1-10 OF 51 REFERENCES

### The Value-Improvement Path: Towards Better Representations for Reinforcement Learning

- Computer ScienceAAAI
- 2021

This paper argues that the value prediction problems faced by an RL agent should be addressed in isolation, but rather as a single, holistic, prediction problem, and demonstrates that a representation that spans the past value-improvement path will also provide an accurate value approximation for future policy improvements.

### Policy-Aware Model Learning for Policy Gradient Methods

- Computer ScienceArXiv
- 2020

This paper examines how the planning module of an MBRL algorithm uses the model, and proposes that the model learning module should incorporate the way the planner is going to use the model.

### SOLAR: Deep Structured Representations for Model-Based Reinforcement Learning

- Computer ScienceICML
- 2019

This paper presents a method for learning representations that are suitable for iterative model-based policy improvement, even when the underlying dynamical system has complex dynamics and image observations, in that these representations are optimized for inferring simple dynamics and cost models given data from the current policy.

### Model-Based Reinforcement Learning with Value-Targeted Regression

- Computer ScienceL4DC
- 2020

This paper proposes a model based RL algorithm that is based on optimism principle, and derives a bound on the regret, which is independent of the total number of states or actions, and is close to a lower bound $\Omega(\sqrt{HdT})$.

### TreeQN and ATreeC: Differentiable Tree-Structured Models for Deep Reinforcement Learning

- Computer ScienceICLR
- 2018

TreeQN, a differentiable, recursive, tree-structured model that serves as a drop-in replacement for any value function network in deep RL with discrete actions, and ATreeC, an actor-critic variant that augments TreeQN with a softmax layer to form a stochastic policy network.

### Reinforcement learning with misspecified model classes

- Computer Science2013 IEEE International Conference on Robotics and Automation
- 2013

An algorithm is presented for which the highest performing model from the model class is guaranteed to be found given unlimited data and computation, by explicitly selecting the model which achieves the highest expected reward, rather than the most likely model.

### Iterative Value-Aware Model Learning

- Computer ScienceNeurIPS
- 2018

A new model-based reinforcement learning (MBRL) framework that incorporates the underlying decision problem in learning the transition model of the environment, called Iterative VAML, that benefits from the structure of how the planning is performed (i.e., through approximate value iteration) to devise a simpler optimization problem.

### Algorithms for Reinforcement Learning

- Computer ScienceAlgorithms for Reinforcement Learning
- 2010

This book focuses on those algorithms of reinforcement learning that build on the powerful theory of dynamic programming, and gives a fairly comprehensive catalog of learning problems, and describes the core ideas, followed by the discussion of their theoretical properties and limitations.

### Value-Aware Loss Function for Model-based Reinforcement Learning

- Computer ScienceAISTATS
- 2017

This work argues that estimating a generative model that minimizes a probabilistic loss, such as the log-loss, is an overkill because it does not take into account the underlying structure of decision problem and the RL algorithm that intends to solve it.

### Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning

- Computer ScienceArtif. Intell.
- 1999