• Corpus ID: 244954755

CoMPS: Continual Meta Policy Search

@article{Berseth2021CoMPSCM,
title={CoMPS: Continual Meta Policy Search},
author={Glen Berseth and Zhiwei Zhang and Grace H. Zhang and Chelsea Finn and Sergey Levine},
journal={ArXiv},
year={2021},
volume={abs/2112.04467}
}
• Published 8 December 2021
• Computer Science
• ArXiv
, analogously to PPO and other importance-sampled policy gradient algorithms. We use this estimator for the inner loop update in Algorithm 1 line 5. We show in our ablation experiments that this approach is needed to enable successful meta-training using the exhaustive off-policy experience collected by CoMPS.
1 Citations

Figures and Tables from this paper

Task-Agnostic Continual Reinforcement Learning: In Praise of a Simple Baseline
• Computer Science
ArXiv
• 2022
Evidence that 3RL’s outperformance stems from its ability to quickly infer how new tasks relate with the previous ones, enabling forward transfer is laid out, by analyzing different training statistics including gradient conﬂict.

References

SHOWING 1-10 OF 74 REFERENCES
Proximal Policy Optimization Algorithms
• Computer Science
ArXiv
• 2017
We propose a new family of policy gradient methods for reinforcement learning, which alternate between sampling data through interaction with the environment, and optimizing a "surrogate" objective
• Computer Science
NeurIPS
• 2018
Empirical results show that the evolved policy gradient algorithm (EPG) achieves faster learning on several randomized environments compared to an off-the-shelf policy gradient method, and its learned loss can generalize to out-of-distribution test time tasks, and exhibits qualitatively different behavior from other popular metalearning algorithms.
ProMP: Proximal Meta-Policy Search
• Computer Science
ICLR
• 2019
A novel meta-learning algorithm is developed that overcomes both the issue of poor credit assignment and previous difficulties in estimating meta-policy gradients and leads to superior pre-adaptation policy behavior and consistently outperforms previous Meta-RL algorithms in sample-efficiency, wall-clock time, and asymptotic performance.
Guided Meta-Policy Search
• Computer Science
NeurIPS
• 2019
This paper proposes to learn a reinforcement learning procedure through imitation of expert policies that solve previously-seen tasks, and demonstrates significant improvements in meta-RL sample efficiency in comparison to prior work as well as the ability to scale to domains with visual observations.
Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Variables
• Computer Science
ICML
• 2019
This paper develops an off-policy meta-RL algorithm that disentangles task inference and control and performs online probabilistic filtering of latent task variables to infer how to solve a new task from small amounts of experience.
Some Considerations on Learning to Explore via Meta-Reinforcement Learning
• Computer Science
ICLR 2018
• 2018
E-MAML and E-$\text{RL}^2$ deliver better performance on tasks where exploration is important and are presented on a novel environment called `Krazy World' and a set of maze environments.
Stabilizing Off-Policy Q-Learning via Bootstrapping Error Reduction
• Computer Science
NeurIPS
• 2019
A practical algorithm, bootstrapping error accumulation reduction (BEAR), is proposed and it is demonstrated that BEAR is able to learn robustly from different off-policy distributions, including random and suboptimal demonstrations, on a range of continuous control tasks.
• Computer Science
ICLR
• 2020
This paper introduces variational Bayes-Adaptive Deep RL (variBAD), a way to meta-learn to perform approximate inference in an unknown environment, and incorporate task uncertainty directly during action selection and achieves higher online return than existing methods.