• Corpus ID: 244953986

Model-Value Inconsistency as a Signal for Epistemic Uncertainty

  title={Model-Value Inconsistency as a Signal for Epistemic Uncertainty},
  author={Angelos Filos and Eszter V'ertes and Zita Marinho and Gregory Farquhar and Diana Borsa and Abram L. Friesen and Feryal M. P. Behbahani and Tom Schaul and Andr{\'e} Barreto and Simon Osindero},
  booktitle={International Conference on Machine Learning},
Using a model of the environment and a value function, an agent can construct many estimates of a state’s value, by unrolling the model for different lengths and bootstrapping with its value function. Our key insight is that one can treat this set of value estimates as a type of ensemble, which we call an implicit value ensemble (IVE). Consequently, the discrepancy between these estimates can be used as a proxy for the agent’s epistemic uncertainty; we term this signal model-value inconsistency… 

Learning General World Models in a Handful of Reward-Free Deployments

This work introduces the reward-free deployment efficiency setting, a new paradigm for RL research, and presents CASCADE, a novel approach for self-supervised exploration in this new setting, using an information theoretic objective inspired by Bayesian Active Learning.

Human-Timescale Adaptation in an Open-Ended Task Space

It is demonstrated that training an RL agent at scale leads to a general in-context learning algorithm that can adapt to open-ended novel embodied 3D problems as quickly as humans.



Temporal Difference Uncertainties as a Signal for Exploration

A novel method for estimating uncertainty over the value function that relies on inducing a distribution over temporal difference errors and incorporates exploration as an intrinsic reward and treats exploration as a separate learning problem, induced by the agent's temporal difference uncertainties.

The Value Equivalence Principle for Model-Based Reinforcement Learning

It is argued that the limited representational resources of model-based RL agents are better used to build models that are directly useful for value-based planning, and the principle of value equivalence underlies a number of recent empirical successes in RL.

Model based Bayesian Exploration

This paper explicitly represents uncertainty about the parameters of the model and build probability distributions over Q-values based on these that are used to compute a myopic approximation to the value of information for each action and hence to select the action that best balances exploration and exploitation.

A Distributional Perspective on Reinforcement Learning

This paper argues for the fundamental importance of the value distribution: the distribution of the random return received by a reinforcement learning agent, and designs a new algorithm which applies Bellman's equation to the learning of approximate value distributions.

Self-Consistent Models and Values

This work investigates a way of augmenting model-based RL, by additionally encouraging a learned model and value function to be jointly self-consistent, and finds that, with appropriate choices, self- Consistency helps both policy evaluation and control.

Improving PILCO with Bayesian Neural Network Dynamics Models

PILCO’s framework is extended to use Bayesian deep dynamics models with approximate variational inference, allowing PILCO to scale linearly with number of trials and observation space dimensionality, and it is shown that moment matching is a crucial simplifying assumption made by the model.

Sample-Efficient Reinforcement Learning with Stochastic Ensemble Value Expansion

Stochastic ensemble value expansion (STEVE), a novel model-based technique that addresses this issue by dynamically interpolating between model rollouts of various horizon lengths for each individual example, outperforms model-free baselines on challenging continuous control benchmarks with an order-of-magnitude increase in sample efficiency.

The Value-Improvement Path: Towards Better Representations for Reinforcement Learning

This paper argues that the value prediction problems faced by an RL agent should be addressed in isolation, but rather as a single, holistic, prediction problem, and demonstrates that a representation that spans the past value-improvement path will also provide an accurate value approximation for future policy improvements.

Value-driven Hindsight Modelling

An approach for representation learning in RL is proposed to learn what to model in a way that can directly help value prediction, which provides us with tractable prediction targets that are directly relevant for a task, and can thus accelerate learning of the value function.

Exploration in Model-based Reinforcement Learning by Empirically Estimating Learning Progress

This work provides a "sanity check" theoretical analysis, and provides experimental studies demonstrating the robustness of these exploration measures in cases of non-stationary environments or where original approaches are misled by wrong domain assumptions.