• Corpus ID: 239998160

Towards Robust Bisimulation Metric Learning

  title={Towards Robust Bisimulation Metric Learning},
  author={Mete Kemertas and Tristan Aumentado-Armstrong},
Learned representations in deep reinforcement learning (DRL) have to extract taskrelevant information from complex observations, balancing between robustness to distraction and informativeness to the policy. Such stable and rich representations, often learned via modern function approximation techniques, can enable practical application of the policy improvement theorem, even in high-dimensional continuous state-action spaces. Bisimulation metrics offer one solution to this representation… 
1 Citations

Figures and Tables from this paper

A Survey of Generalisation in Deep Reinforcement Learning
It is argued that taking a purely procedural content generation approach to benchmark design is not conducive to progress in generalisation, and fast online adaptation and tackling RL-specific problems as some areas for future work on methods for generalisation are suggested.


Plannable Approximations to MDP Homomorphisms: Equivariance under Actions
It is proved that when the loss is zero, the optimal policy in the abstract MDP can be successfully lifted to the original MDP, and a contrastive loss function is introduced that enforces action equivariance on the learned representations.
Incentivizing Exploration In Reinforcement Learning With Deep Predictive Models
This paper considers the challenging Atari games domain, and proposes a new exploration method based on assigning exploration bonuses from a concurrently learned model of the system dynamics that provides the most consistent improvement across a range of games that pose a major challenge for prior methods.
Learning Actionable Representations with Goal-Conditioned Policies
Representation learning is a central challenge across a range of machine learning areas. In reinforcement learning, effective and functional representations have the potential to tremendously
Surprise-Based Intrinsic Motivation for Deep Reinforcement Learning
This work proposes to learn a model of the MDP transition probabilities concurrently with the policy, and to form intrinsic rewards that approximate the KL-divergence of the true transition probabilities from the learned model, which results in using surprisal as intrinsic motivation.
Metrics and continuity in reinforcement learning
A unified formalism for defining topologies through the lens of metrics is introduced and a hierarchy amongst these metrics is established and their theoretical implications on the Markov Decision Process specifying the reinforcement learning problem are demonstrated.
Efficient Model-Based Deep Reinforcement Learning with Variational State Tabulation
This article introduces Variational State Tabulation, which maps an environment with a high-dimensional state space to an abstract tabular model and shows how VaST can rapidly learn to maximize reward in tasks like 3D navigation and efficiently adapt to sudden changes in rewards or transition probabilities.
DeepMDP: Learning Continuous Latent Space Models for Representation Learning
This work introduces the concept of a DeepMDP, a parameterized latent space model that is trained via the minimization of two tractable losses: prediction of rewards and prediction of the distribution over next latent states, and shows that the optimization of these objectives guarantees the quality of the latent space as a representation of the state space.
Online abstraction with MDP homomorphisms for Deep Learning
A new algorithm for finding abstract MDPs in environments with continuous state spaces is proposed, based on MDP homomorphisms, a structure-preserving mapping between M DPs, which demonstrates the algorithm's ability to learn abstractions from collected experience and show how to reuse the abstractions to guide exploration in new tasks the agent encounters.
An algebraic approach to abstraction in reinforcement learning
To operate effectively in complex environments learning agents ignore irrelevant details. Stated in general terms this is a very difficult problem. Much of the work in this field is specialized to
Decoupling Representation Learning from Reinforcement Learning
A new unsupervised learning task, called Augmented Temporal Contrast (ATC), which trains a convolutional encoder to associate pairs of observations separated by a short time difference, under image augmentations and using a contrastive loss.