Value Gradient weighted Model-Based Reinforcement Learning

  title={Value Gradient weighted Model-Based Reinforcement Learning},
  author={Claas Voelcker and Victor Liao and Animesh Garg and Amir-massoud Farahmand},
Model-based reinforcement learning (MBRL) is a sample efficient technique to obtain control policies, yet unavoidable modeling errors often lead performance deterioration. The model in MBRL is often solely fitted to reconstruct dynamics, state observations in particular, while the impact of model error on the policy is not captured by the training objective. This leads to a mismatch between the intended goal of MBRL, enabling good policy and value learning, and the target of the loss function… 

Figures from this paper

A Survey on Model-based Reinforcement Learning
This survey takes a review of model-based reinforcement learning (MBRL) with a focus on the recent progress in deep RL, and discusses the applicability and advantages of MBRL in real-world tasks.
Live in the Moment: Learning Dynamics Model Adapted to Evolving Policy
A novel model-based RL method, named Policy-adaptation Model-based Actor-Critic (PMAC) is proposed, which learns a policy-adapted dynamics model based on aPolicy- Adaptation mechanism that dynamically adjusts the historical policy mixture distribution to ensure the learned model can continually adapt to the state-action visitation distribution of the evolving policy.
Deciding What to Model: Value-Equivalent Sampling for Reinforcement Learning
An algorithm is introduced that iteratively computes an approximately-value-equivalent, lossy compression of the environment which an agent may feasibly target in lieu of the true model, and an information-theoretic, Bayesian regret bound is proved for this algorithm that holds for any Night-horizon, episodic sequential decision-making problem.
VIPer: Iterative Value-Aware Model Learning on the Value Improvement Path
  • Computer Science
  • 2022
A practical and generalizable Decision-Aware Model-Based Reinforcement Learning algorithm to improve the generalization of VAML-like model learning and shows theoretically for linear and tabular spaces that the proposed algorithm is sensible, justifying extension to non-linear and continuous spaces.


Objective Mismatch in Model-based Reinforcement Learning
It is demonstrated that the likelihood of one-step ahead predictions is not always correlated with control performance, a critical limitation in the standard MBRL framework which will require further research to be fully understood and addressed.
Gradient-Aware Model-based Policy Search
A novel model-based policy search approach that exploits the knowledge of the current agent policy to learn an approximate transition model, focusing on the portions of the environment that are most relevant for policy improvement.
Iterative Value-Aware Model Learning
A new model-based reinforcement learning (MBRL) framework that incorporates the underlying decision problem in learning the transition model of the environment, called Iterative VAML, that benefits from the structure of how the planning is performed (i.e., through approximate value iteration) to devise a simpler optimization problem.
Decision-Aware Model Learning for Actor-Critic Methods: When Theory Does Not Meet Practice
The results suggest that, despite theoretical guarantees, learning a value-aware model in continuous domains does not ensure better performance on the overall task and that naive approaches such as maximum likelihood estimation often achieve superior performance with less computational cost.
Value-Aware Loss Function for Model-based Reinforcement Learning
This work argues that estimating a generative model that minimizes a probabilistic loss, such as the log-loss, is an overkill because it does not take into account the underlying structure of decision problem and the RL algorithm that intends to solve it.
The Value Equivalence Principle for Model-Based Reinforcement Learning
It is argued that the limited representational resources of model-based RL agents are better used to build models that are directly useful for value-based planning, and the principle of value equivalence underlies a number of recent empirical successes in RL.
Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor
This paper proposes soft actor-critic, an off-policy actor-Critic deep RL algorithm based on the maximum entropy reinforcement learning framework, and achieves state-of-the-art performance on a range of continuous control benchmark tasks, outperforming prior on-policy and off- policy methods.
Learning Dynamics Models for Model Predictive Agents
This paper compares the performance of different design choices for learning dynamics models to planning with a ground-truth model – the simulator, and describes a set of qualitative findings, rules of thumb, and future research directions for planning with learned dynamics models.
Value Prediction Network
This paper proposes a novel deep reinforcement learning architecture, called Value Prediction Network (VPN), which integrates model-free and model-based RL methods into a single neural network, which outperforms Deep Q-Network on several Atari games even with short-lookahead planning.
Goal-Aware Prediction: Learning to Model What Matters
This paper proposes to direct prediction towards task relevant information, enabling the model to be aware of the current task and encouraging it to only model relevant quantities of the state space, resulting in a learning objective that more closely matches the downstream task.