Corpus ID: 231592626

Linear Representation Meta-Reinforcement Learning for Instant Adaptation

  title={Linear Representation Meta-Reinforcement Learning for Instant Adaptation},
  author={Matt Peng and Banghua Zhu and Jiantao Jiao},
This paper introduces Fast Linearized Adaptive Policy (FLAP), a new metareinforcement learning (meta-RL) method that is able to extrapolate well to outof-distribution tasks without the need to reuse data from training, and adapt almost instantaneously with the need of only a few samples during testing. FLAP builds upon the idea of learning a shared linear representation of the policy so that when adapting to a new task, it suffices to predict a set of linear weights. A separate adapter network… Expand
Improving Generalization in Meta-RL with Imaginary Tasks from Latent Dynamics Mixture
LDM is proposed that trains a reinforcement learning agent with imaginary tasks generated from mixtures of learned latent dynamics that significantly outperforms standard meta-RL methods in test returns on the gridworld navigation and MuJoCo tasks where the authors strictly separate the training task distribution and the test task distribution. Expand
Augmented World Models Facilitate Zero-Shot Dynamics Generalization From a Single Offline Environment
This paper augments a learned dynamics model with simple transformations that seek to capture potential changes in physical properties of the robot, leading to more robust policies. Expand


Meta-Reinforcement Learning Robust to Distributional Shift via Model Identification and Experience Relabeling
This work presents model identification and experience relabeling (MIER), a meta-reinforcement learning algorithm that is both efficient and extrapolates well when faced with out-of-distribution tasks at test time. Expand
Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Variables
This paper develops an off-policy meta-RL algorithm that disentangles task inference and control and performs online probabilistic filtering of latent task variables to infer how to solve a new task from small amounts of experience. Expand
Continuous control with deep reinforcement learning
This work presents an actor-critic, model-free algorithm based on the deterministic policy gradient that can operate over continuous action spaces, and demonstrates that for many of the tasks the algorithm can learn policies end-to-end: directly from raw pixel inputs. Expand
Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks
We propose an algorithm for meta-learning that is model-agnostic, in the sense that it is compatible with any model trained with gradient descent and applicable to a variety of different learningExpand
Successor Features for Transfer in Reinforcement Learning
This work proposes a transfer framework for the scenario where the reward function changes between tasks but the environment's dynamics remain the same, and derives two theorems that set the approach in firm theoretical ground and presents experiments that show that it successfully promotes transfer in practice. Expand
Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor
This paper proposes soft actor-critic, an off-policy actor-Critic deep RL algorithm based on the maximum entropy reinforcement learning framework, and achieves state-of-the-art performance on a range of continuous control benchmark tasks, outperforming prior on-policy and off- policy methods. Expand
Benchmarking Deep Reinforcement Learning for Continuous Control
This work presents a benchmark suite of continuous control tasks, including classic tasks like cart-pole swing-up, tasks with very high state and action dimensionality such as 3D humanoid locomotion, task with partial observations, and tasks with hierarchical structure. Expand
Meta-Reinforcement Learning of Structured Exploration Strategies
This work introduces a novel gradient-based fast adaptation algorithm -- model agnostic exploration with structured noise (MAESN) -- to learn exploration strategies from prior experience that are informed by prior knowledge and are more effective than random action-space noise. Expand
End-to-End Training of Deep Visuomotor Policies
This paper develops a method that can be used to learn policies that map raw image observations directly to torques at the robot's motors, trained using a partially observed guided policy search method, with supervision provided by a simple trajectory-centric reinforcement learning method. Expand
A Tutorial on Linear Function Approximators for Dynamic Programming and Reinforcement Learning
This article describes algorithms in a unified framework, giving pseudocode together with memory and iteration complexity analysis for each, and empirical evaluations of these techniques with four representations across four domains provide insight into how these algorithms perform with various feature sets in terms of running time and performance. Expand