Corpus ID: 221006342

Offline Meta Reinforcement Learning

  title={Offline Meta Reinforcement Learning},
  author={R. Dorfman and Aviv Tamar},
Consider the following problem, which we term Offline Meta Reinforcement Learning (OMRL): given the complete training histories of $N$ conventional RL agents, trained on $N$ different tasks, design a learning agent that can quickly maximize reward in a new, unseen task from the same task distribution. In particular, while each conventional RL agent explored and exploited its own different task, the OMRL agent must identify regularities in the data that lead to effective exploration/exploitation… Expand
Efficient Fully-Offline Meta-Reinforcement Learning via Distance Metric Learning and Behavior Regularization
This work enforces behavior regularization on learned policy as a general approach to offline RL, combined with a deterministic context encoder for efficient task inference, and proposes a novel negative-power distance metric on bounded context embedding space, whose gradients propagation is detached from that of the Bellman backup. Expand
Improved Context-Based Offline Meta-RL with Attention and Contrastive Learning
This work improves upon one of the SOTA OMRL algorithms, FOCAL, by incorporating intra-task attention mechanism and inter-task contrastive learning objectives for more effective task inference and learning of control. Expand
Decoupling Exploration and Exploitation for Meta-Reinforcement Learning without Sacrifices
This work constructs an exploitation objective that automatically identifies taskrelevant information and an exploration objective to recover only this information, and avoids local optima in end-to-end training, without sacrificing optimal exploration. Expand
Exploration in Approximate Hyper-State Space for Meta Reinforcement Learning
This work proposes HyperX, a novel method for meta-learning in sparse reward tasks that incentivise the agent to explore in approximate hyper-state space, i.e., the joint state and approximate belief space, where the beliefs are over tasks. Expand
Offline Meta-Reinforcement Learning with Advantage Weighting
This paper introduces the offline meta-reinforcement learning (offline meta-RL) problem setting and proposes an algorithm that performs well in this setting, and proposes MACAW, an optimization-based meta-learning algorithm that uses simple, supervised regression objectives for both the inner and outer loop of meta-training. Expand
Offline Meta-Reinforcement Learning with Online Self-Supervision
A hybrid offline meta-RL algorithm is proposed, which uses offline data with rewards to meta-train an adaptive policy, and then collects additional unsupervised online data, without any ground truth reward labels, to bridge this distribution shift problem. Expand


Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Variables
This paper develops an off-policy meta-RL algorithm that disentangles task inference and control and performs online probabilistic filtering of latent task variables to infer how to solve a new task from small amounts of experience. Expand
Meta reinforcement learning as task inference
This work proposes a method that separately learns the policy and the task belief by taking advantage of various kinds of privileged information, which can be very effective at solving standard meta-RL environments, as well as a complex continuous control environment with sparse rewards and requiring long-term memory. Expand
RL$^2$: Fast Reinforcement Learning via Slow Reinforcement Learning
This paper proposes to represent a "fast" reinforcement learning algorithm as a recurrent neural network (RNN) and learn it from data, encoded in the weights of the RNN, which are learned slowly through a general-purpose ("slow") RL algorithm. Expand
Experiments on standard continuous-control benchmarks suggest that MQL compares favorably with the state of the art in meta-RL, and draws upon ideas in propensity estimation to do so and thereby amplifies the amount of available data for adaptation. Expand
Meta Reinforcement Learning from observational data.
Pre-training is transformative in supervised learning: a large network trained with large and existing datasets can be used as an initialization when learning a new task. Such initialization speedsExpand
Meta-Reinforcement Learning of Structured Exploration Strategies
This work introduces a novel gradient-based fast adaptation algorithm -- model agnostic exploration with structured noise (MAESN) -- to learn exploration strategies from prior experience that are informed by prior knowledge and are more effective than random action-space noise. Expand
VariBAD: A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning
This paper introduces variational Bayes-Adaptive Deep RL (variBAD), a way to meta-learn to perform approximate inference in an unknown environment, and incorporate task uncertainty directly during action selection and achieves higher online return than existing methods. Expand
Off-Policy Deep Reinforcement Learning without Exploration
This paper introduces a novel class of off-policy algorithms, batch-constrained reinforcement learning, which restricts the action space in order to force the agent towards behaving close to on-policy with respect to a subset of the given data. Expand
Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor
This paper proposes soft actor-critic, an off-policy actor-Critic deep RL algorithm based on the maximum entropy reinforcement learning framework, and achieves state-of-the-art performance on a range of continuous control benchmark tasks, outperforming prior on-policy and off- policy methods. Expand
An analytic solution to discrete Bayesian reinforcement learning
This work proposes a new algorithm, called BEETLE, for effective online learning that is computationally efficient while minimizing the amount of exploration, and takes a Bayesian model-based approach, framing RL as a partially observable Markov decision process. Expand