• Corpus ID: 226278317

The Value Equivalence Principle for Model-Based Reinforcement Learning

@article{Grimm2020TheVE,
  title={The Value Equivalence Principle for Model-Based Reinforcement Learning},
  author={Christopher Grimm and Andr{\'e} Barreto and Satinder Singh and David Silver},
  journal={ArXiv},
  year={2020},
  volume={abs/2011.03506}
}
Learning models of the environment from data is often viewed as an essential component to building intelligent reinforcement learning (RL) agents. The common practice is to separate the learning of the model from its use, by constructing a model of the environment's dynamics that correctly predicts the observed state transitions. In this paper we argue that the limited representational resources of model-based RL agents are better used to build models that are directly useful for value-based… 
Proper Value Equivalence
TLDR
A loss function is constructed for learning PVE models and it is argued that popular algorithms such as MuZero can be understood as minimizing an upper bound for this loss.
Model-Advantage Optimization for Model-Based Reinforcement Learning
TLDR
This work proposes a novel value-aware objective that is an upper bound on the absolute performance difference of a policy across two models and proposes a general purpose algorithm that modifies the standard MBRL pipeline – enabling learning with value aware objectives.
Self-Consistent Models and Values
TLDR
This work investigates a way of augmenting model-based RL, by additionally encouraging a learned model and value function to be jointly self-consistent, and finds that, with appropriate choices, self- Consistency helps both policy evaluation and control.
Value Gradient weighted Model-Based Reinforcement Learning
TLDR
The Value-Gradient weighted Model loss (VaGraM) is proposed, a novel method for value-aware model learning which improves the performance of MBRL in challenging settings, such as small model capacity and the presence of distracting state dimensions.
Model-Advantage and Value-Aware Models for Model-Based Reinforcement Learning: Bridging the Gap in Theory and Practice
TLDR
This work identifies the issue of stale value estimates in naively substituting value-aware objectives in place of maximum-likelihood in dyna-style model-based RL algorithms and proposes a proposed remedy that bridges the long-standing gap in theory and practice ofvalue-aware model learning.
Model-Value Inconsistency as a Signal for Epistemic Uncertainty
TLDR
This work provides empirical evidence in both tabular and function approximation settings from pixels that self-inconsistency is useful as a signal for exploration, for acting safely under distribution shifts, and for robustifying value-based planning with a learned model.
Procedural Generalization by Planning with Self-Supervised World Models
TLDR
Overall, this work suggests that building generalizable agents requires moving beyond the single-task, model-free paradigm and towards self-supervised model-based agents that are trained in rich, procedural, multi-task environments.
High-Accuracy Model-Based Reinforcement Learning, a Survey
TLDR
This paper surveys model-based reinforcement learning methods, explaining in detail how they work and what their strengths and weaknesses are, and concludes with a research agenda for future work to make the methods more robust and more widely applicable to other applications.
Control-Oriented Model-Based Reinforcement Learning with Implicit Differentiation
TLDR
This work proposes an end-to-end approach for model learning which directly optimizes the expected returns using implicit differentiation and provides theoretical and empirical evidence highlighting the benefits of this approach in the model misspecification regime compared to likelihoodbased methods.
Visualizing MuZero Models
TLDR
This paper visualizes the latent representation of MuZero agents and finds that action trajectories may diverge between observation embeddings and internal state transition dynamics, which could lead to instability during planning.
...
1
2
...

References

SHOWING 1-10 OF 51 REFERENCES
SOLAR: Deep Structured Representations for Model-Based Reinforcement Learning
TLDR
This paper presents a method for learning representations that are suitable for iterative model-based policy improvement, even when the underlying dynamical system has complex dynamics and image observations, in that these representations are optimized for inferring simple dynamics and cost models given data from the current policy.
Model-Based Reinforcement Learning with Value-Targeted Regression
TLDR
This paper proposes a model based RL algorithm that is based on optimism principle, and derives a bound on the regret, which is independent of the total number of states or actions, and is close to a lower bound $\Omega(\sqrt{HdT})$.
TreeQN and ATreeC: Differentiable Tree-Structured Models for Deep Reinforcement Learning
TLDR
TreeQN, a differentiable, recursive, tree-structured model that serves as a drop-in replacement for any value function network in deep RL with discrete actions, and ATreeC, an actor-critic variant that augments TreeQN with a softmax layer to form a stochastic policy network.
Reinforcement learning with misspecified model classes
TLDR
An algorithm is presented for which the highest performing model from the model class is guaranteed to be found given unlimited data and computation, by explicitly selecting the model which achieves the highest expected reward, rather than the most likely model.
Iterative Value-Aware Model Learning
TLDR
A new model-based reinforcement learning (MBRL) framework that incorporates the underlying decision problem in learning the transition model of the environment, called Iterative VAML, that benefits from the structure of how the planning is performed (i.e., through approximate value iteration) to devise a simpler optimization problem.
Algorithms for Reinforcement Learning
TLDR
This book focuses on those algorithms of reinforcement learning that build on the powerful theory of dynamic programming, and gives a fairly comprehensive catalog of learning problems, and describes the core ideas, followed by the discussion of their theoretical properties and limitations.
Value-Aware Loss Function for Model-based Reinforcement Learning
TLDR
This work argues that estimating a generative model that minimizes a probabilistic loss, such as the log-loss, is an overkill because it does not take into account the underlying structure of decision problem and the RL algorithm that intends to solve it.
Dyna-Style Planning with Linear Function Approximation and Prioritized Sweeping
TLDR
This paper develops an explicitly model-based approach extending the Dyna architecture to linear function approximation, to prove that linear Dyna-style planning converges to a unique solution independent of the generating distribution, under natural conditions.
Value Prediction Network
TLDR
This paper proposes a novel deep reinforcement learning architecture, called Value Prediction Network (VPN), which integrates model-free and model-based RL methods into a single neural network, which outperforms Deep Q-Network on several Atari games even with short-lookahead planning.
...
1
2
3
4
5
...