• Corpus ID: 244954755

CoMPS: Continual Meta Policy Search

  title={CoMPS: Continual Meta Policy Search},
  author={Glen Berseth and Zhiwei Zhang and Grace H. Zhang and Chelsea Finn and Sergey Levine},
, analogously to PPO and other importance-sampled policy gradient algorithms. We use this estimator for the inner loop update in Algorithm 1 line 5. We show in our ablation experiments that this approach is needed to enable successful meta-training using the exhaustive off-policy experience collected by CoMPS. 
1 Citations
Task-Agnostic Continual Reinforcement Learning: In Praise of a Simple Baseline
Evidence that 3RL’s outperformance stems from its ability to quickly infer how new tasks relate with the previous ones, enabling forward transfer is laid out, by analyzing different training statistics including gradient conflict.


Proximal Policy Optimization Algorithms
We propose a new family of policy gradient methods for reinforcement learning, which alternate between sampling data through interaction with the environment, and optimizing a "surrogate" objective
Evolved Policy Gradients
Empirical results show that the evolved policy gradient algorithm (EPG) achieves faster learning on several randomized environments compared to an off-the-shelf policy gradient method, and its learned loss can generalize to out-of-distribution test time tasks, and exhibits qualitatively different behavior from other popular metalearning algorithms.
ProMP: Proximal Meta-Policy Search
A novel meta-learning algorithm is developed that overcomes both the issue of poor credit assignment and previous difficulties in estimating meta-policy gradients and leads to superior pre-adaptation policy behavior and consistently outperforms previous Meta-RL algorithms in sample-efficiency, wall-clock time, and asymptotic performance.
Guided Meta-Policy Search
This paper proposes to learn a reinforcement learning procedure through imitation of expert policies that solve previously-seen tasks, and demonstrates significant improvements in meta-RL sample efficiency in comparison to prior work as well as the ability to scale to domains with visual observations.
Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Variables
This paper develops an off-policy meta-RL algorithm that disentangles task inference and control and performs online probabilistic filtering of latent task variables to infer how to solve a new task from small amounts of experience.
Some Considerations on Learning to Explore via Meta-Reinforcement Learning
E-MAML and E-$\text{RL}^2$ deliver better performance on tasks where exploration is important and are presented on a novel environment called `Krazy World' and a set of maze environments.
Stabilizing Off-Policy Q-Learning via Bootstrapping Error Reduction
A practical algorithm, bootstrapping error accumulation reduction (BEAR), is proposed and it is demonstrated that BEAR is able to learn robustly from different off-policy distributions, including random and suboptimal demonstrations, on a range of continuous control tasks.
VariBAD: A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning
This paper introduces variational Bayes-Adaptive Deep RL (variBAD), a way to meta-learn to perform approximate inference in an unknown environment, and incorporate task uncertainty directly during action selection and achieves higher online return than existing methods.
Adaptive Gradient-Based Meta-Learning Methods
This approach enables the task-similarity to be learned adaptively, provides sharper transfer-risk bounds in the setting of statistical learning-to-learn, and leads to straightforward derivations of average-case regret bounds for efficient algorithms in settings where thetask-environment changes dynamically or the tasks share a certain geometric structure.
Offline Meta Reinforcement Learning
A Bayesian RL (BRL) view is taken, and the recently proposed VariBAD BRL algorithm is extended to the off-policy setting, and learning of Bayes-optimal exploration strategies from offline data using deep neural networks is demonstrated.