Corpus ID: 53015479

ProMP: Proximal Meta-Policy Search

@article{Rothfuss2019ProMPPM,
  title={ProMP: Proximal Meta-Policy Search},
  author={Jonas Rothfuss and Dennis Lee and Ignasi Clavera and Tamim Asfour and P. Abbeel},
  journal={ArXiv},
  year={2019},
  volume={abs/1810.06784}
}
Credit assignment in Meta-reinforcement learning (Meta-RL) is still poorly understood. Existing methods either neglect credit assignment to pre-adaptation behavior or implement it naively. This leads to poor sample-efficiency during meta-training as well as ineffective task identification strategies. This paper provides a theoretical analysis of credit assignment in gradient-based Meta-RL. Building on the gained insights we develop a novel meta-learning algorithm that overcomes both the issue… Expand
Meta-Q-Learning
TLDR
Experiments on standard continuous-control benchmarks suggest that MQL compares favorably with the state of the art in meta-RL, and draws upon ideas in propensity estimation to do so and thereby amplifies the amount of available data for adaptation. Expand
Learn to Effectively Explore in Context-Based Meta-RL
TLDR
A novel off-policy context-based meta-RL approach that efficiently learns a separate exploration policy to support fast adaptation, as well as a context-aware exploitation policy to maximize extrinsic return is presented. Expand
MAME : Model-Agnostic Meta-Exploration
TLDR
This work proposes to explicitly model a separate exploration policy for the task distribution and shows that using self-supervised or supervised learning objectives for adaptation allows for more efficient inner-loop updates and also demonstrates the superior performance of the model compared to prior works. Expand
MetaCURE: Meta Reinforcement Learning with Empowerment-Driven Exploration
TLDR
A new off-policy meta-RL framework is developed, which efficiently learns separate context-aware exploration and exploitation policies by sharing the knowledge of task inference, and significantly outperforms state-of-the-art baselines on various sparsereward MuJoCo locomotion tasks and more complex sparse-reward Meta-World tasks. Expand
Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Variables
TLDR
This paper develops an off-policy meta-RL algorithm that disentangles task inference and control and performs online probabilistic filtering of latent task variables to infer how to solve a new task from small amounts of experience. Expand
Double Meta-Learning for Data Efficient Policy Optimization in Non-Stationary Environments
TLDR
This paper proposes a meta-reinforcement learning approach to learn the dynamic model of a non-stationary environment to be used for meta-policy optimization later, and demonstrates that the proposed method can meta-learn the policy with the data efficiency of model-based learning approaches while achieving the high asymptotic performance ofmodel-free meta- reinforcementLearning. Expand
Curriculum in Gradient-Based Meta-Reinforcement Learning
TLDR
Meta Active Domain Randomization ( meta-ADR), which learns a curriculum of tasks for gradient-based meta-RL in a similar as ADR does for sim2real transfer, is proposed and it is found that this approach induces more stable policies on a variety of simulated locomotion and navigation tasks. Expand
Guided Meta-Policy Search
TLDR
This paper proposes to learn a reinforcement learning procedure through imitation of expert policies that solve previously-seen tasks, and demonstrates significant improvements in meta-RL sample efficiency in comparison to prior work as well as the ability to scale to domains with visual observations. Expand
Taming MAML: Efficient unbiased meta-reinforcement learning
TLDR
A surrogate objective function named, Taming MAML (TMAML), is proposed that adds control variates into gradient estimation via automatic differentiation and improves the quality of gradient estimation by reducing variance without introducing bias. Expand
Meta-Reinforcement Learning Robust to Distributional Shift via Model Identification and Experience Relabeling
TLDR
This work presents model identification and experience relabeling (MIER), a meta-reinforcement learning algorithm that is both efficient and extrapolates well when faced with out-of-distribution tasks at test time. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 42 REFERENCES
Model-Based Reinforcement Learning via Meta-Policy Optimization
TLDR
This work proposes Model-Based Meta-Policy-Optimization (MB-MPO), an approach that foregoes the strong reliance on accurate learned dynamics models and uses an ensemble of learned dynamic models to create a policy that can quickly adapt to any model in the ensemble with one policy gradient step. Expand
Meta-Reinforcement Learning of Structured Exploration Strategies
TLDR
This work introduces a novel gradient-based fast adaptation algorithm -- model agnostic exploration with structured noise (MAESN) -- to learn exploration strategies from prior experience that are informed by prior knowledge and are more effective than random action-space noise. Expand
Proximal Policy Optimization Algorithms
We propose a new family of policy gradient methods for reinforcement learning, which alternate between sampling data through interaction with the environment, and optimizing a "surrogate" objectiveExpand
Infinite-Horizon Policy-Gradient Estimation
TLDR
GPOMDP, a simulation-based algorithm for generating a biased estimate of the gradient of the average reward in Partially Observable Markov Decision Processes (POMDPs) controlled by parameterized stochastic policies, is introduced. Expand
On First-Order Meta-Learning Algorithms
TLDR
A family of algorithms for learning a parameter initialization that can be fine-tuned quickly on a new task, using only first-order derivatives for the meta-learning updates, including Reptile, which works by repeatedly sampling a task, training on it, and moving the initialization towards the trained weights on that task. Expand
Continuous Adaptation via Meta-Learning in Nonstationary and Competitive Environments
TLDR
A simple gradient-based meta-learning algorithm suitable for adaptation in dynamically changing and adversarial scenarios is developed and demonstrated that meta- learning enables significantly more efficient adaptation than reactive baselines in the few-shot regime. Expand
Constrained Policy Optimization
TLDR
Constrained Policy Optimization (CPO) is proposed, the first general-purpose policy search algorithm for constrained reinforcement learning with guarantees for near-constraint satisfaction at each iteration, and allows us to train neural network policies for high-dimensional control while making guarantees about policy behavior all throughout training. Expand
Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks
We propose an algorithm for meta-learning that is model-agnostic, in the sense that it is compatible with any model trained with gradient descent and applicable to a variety of different learningExpand
Meta-Gradient Reinforcement Learning
TLDR
A gradient-based meta-learning algorithm is discussed that is able to adapt the nature of the return, online, whilst interacting and learning from the environment and achieved a new state-of-the-art performance. Expand
A Simple Neural Attentive Meta-Learner
TLDR
This work proposes a class of simple and generic meta-learner architectures that use a novel combination of temporal convolutions and soft attention; the former to aggregate information from past experience and the latter to pinpoint specific pieces of information. Expand
...
1
2
3
4
5
...