# The Value Equivalence Principle for Model-Based Reinforcement Learning

@article{Grimm2020TheVE, title={The Value Equivalence Principle for Model-Based Reinforcement Learning}, author={Christopher Grimm and Andr{\'e} Barreto and Satinder Singh and David Silver}, journal={ArXiv}, year={2020}, volume={abs/2011.03506} }

Learning models of the environment from data is often viewed as an essential component to building intelligent reinforcement learning (RL) agents. The common practice is to separate the learning of the model from its use, by constructing a model of the environment's dynamics that correctly predicts the observed state transitions. In this paper we argue that the limited representational resources of model-based RL agents are better used to build models that are directly useful for value-based…

## 38 Citations

### Proper Value Equivalence

- Computer ScienceNeurIPS
- 2021

A loss function is constructed for learning PVE models and it is argued that popular algorithms such as MuZero can be understood as minimizing an upper bound for this loss.

### Deciding What to Model: Value-Equivalent Sampling for Reinforcement Learning

- Computer ScienceArXiv
- 2022

An algorithm is introduced that iteratively computes an approximately-value-equivalent, lossy compression of the environment which an agent may feasibly target in lieu of the true model, and an information-theoretic, Bayesian regret bound is proved for this algorithm that holds for anyinite-horizon, episodic sequential decision-making problem.

### Model-Advantage Optimization for Model-Based Reinforcement Learning

- Computer ScienceArXiv
- 2021

This work proposes a novel value-aware objective that is an upper bound on the absolute performance difference of a policy across two models and proposes a general purpose algorithm that modifies the standard MBRL pipeline – enabling learning with value aware objectives.

### Self-Consistent Models and Values

- Computer ScienceNeurIPS
- 2021

This work investigates a way of augmenting model-based RL, by additionally encouraging a learned model and value function to be jointly self-consistent, and finds that, with appropriate choices, self- Consistency helps both policy evaluation and control.

### Should Models Be Accurate?

- Computer Science
- 2022

This work introduces a meta-learning algorithm for training models with a focus on their usefulness to the learner instead of their accuracy in modelling the environment, and shows that in a simple non-stationary environment, this algorithm enables faster learning than even using an accurate model built with domain-speciﬁc knowledge of thenon-stationarity.

### On the role of planning in model-based deep reinforcement learning

- Computer ScienceICLR
- 2021

This paper studies the performance of MuZero, a state-of-the-art model-based reinforcement learning algorithm with strong connections and overlapping components with many other MBRL algorithms, and suggests that planning alone is insufficient to drive strong generalization.

### Model-based Reinforcement Learning: A Survey

- Computer ScienceArXiv
- 2020

A survey of the integration of model-based reinforcement learning and planning, better known as model- based reinforcement learning, and a broad conceptual overview of planning-learning combinations for MDP optimization are presented.

### Value Gradient weighted Model-Based Reinforcement Learning

- Computer ScienceICLR
- 2022

The Value-Gradient weighted Model loss (VaGraM) is proposed, a novel method for value-aware model learning which improves the performance of MBRL in challenging settings, such as small model capacity and the presence of distracting state dimensions.

### Between Rate-Distortion Theory & Value Equivalence in Model-Based Reinforcement Learning

- Computer ScienceArXiv
- 2022

This work embraces a notion of approximate value equivalence and introduces an algorithm for incrementally synthesizing simple and useful approxi-mations of the environment from which an agent might still recover near-optimal behavior.

### Model-Advantage and Value-Aware Models for Model-Based Reinforcement Learning: Bridging the Gap in Theory and Practice

- Computer Science
- 2021

This work identifies the issue of stale value estimates in naively substituting value-aware objectives in place of maximum-likelihood in dyna-style model-based RL algorithms and proposes a proposed remedy that bridges the long-standing gap in theory and practice ofvalue-aware model learning.

## References

SHOWING 1-10 OF 51 REFERENCES

### The Value-Improvement Path: Towards Better Representations for Reinforcement Learning

- Computer ScienceAAAI
- 2021

This paper argues that the value prediction problems faced by an RL agent should be addressed in isolation, but rather as a single, holistic, prediction problem, and demonstrates that a representation that spans the past value-improvement path will also provide an accurate value approximation for future policy improvements.

### Policy-Aware Model Learning for Policy Gradient Methods

- Computer ScienceArXiv
- 2020

This paper examines how the planning module of an MBRL algorithm uses the model, and proposes that the model learning module should incorporate the way the planner is going to use the model.

### SOLAR: Deep Structured Representations for Model-Based Reinforcement Learning

- Computer ScienceICML
- 2019

This paper presents a method for learning representations that are suitable for iterative model-based policy improvement, even when the underlying dynamical system has complex dynamics and image observations, in that these representations are optimized for inferring simple dynamics and cost models given data from the current policy.

### Model-Based Reinforcement Learning with Value-Targeted Regression

- Computer ScienceL4DC
- 2020

This paper proposes a model based RL algorithm that is based on optimism principle, and derives a bound on the regret, which is independent of the total number of states or actions, and is close to a lower bound $\Omega(\sqrt{HdT})$.

### TreeQN and ATreeC: Differentiable Tree-Structured Models for Deep Reinforcement Learning

- Computer ScienceICLR
- 2018

TreeQN, a differentiable, recursive, tree-structured model that serves as a drop-in replacement for any value function network in deep RL with discrete actions, and ATreeC, an actor-critic variant that augments TreeQN with a softmax layer to form a stochastic policy network.

### Reinforcement learning with misspecified model classes

- Computer Science2013 IEEE International Conference on Robotics and Automation
- 2013

An algorithm is presented for which the highest performing model from the model class is guaranteed to be found given unlimited data and computation, by explicitly selecting the model which achieves the highest expected reward, rather than the most likely model.

### Iterative Value-Aware Model Learning

- Computer ScienceNeurIPS
- 2018

A new model-based reinforcement learning (MBRL) framework that incorporates the underlying decision problem in learning the transition model of the environment, called Iterative VAML, that benefits from the structure of how the planning is performed (i.e., through approximate value iteration) to devise a simpler optimization problem.

### Algorithms for Reinforcement Learning

- Computer ScienceAlgorithms for Reinforcement Learning
- 2010

This book focuses on those algorithms of reinforcement learning that build on the powerful theory of dynamic programming, and gives a fairly comprehensive catalog of learning problems, and describes the core ideas, followed by the discussion of their theoretical properties and limitations.

### Value-Aware Loss Function for Model-based Reinforcement Learning

- Computer ScienceAISTATS
- 2017

This work argues that estimating a generative model that minimizes a probabilistic loss, such as the log-loss, is an overkill because it does not take into account the underlying structure of decision problem and the RL algorithm that intends to solve it.

### Dyna-Style Planning with Linear Function Approximation and Prioritized Sweeping

- MathematicsUAI
- 2008

This paper develops an explicitly model-based approach extending the Dyna architecture to linear function approximation, to prove that linear Dyna-style planning converges to a unique solution independent of the generating distribution, under natural conditions.