• Corpus ID: 204838225

Bottom-Up Meta-Policy Search

@article{Melo2019BottomUpMS,
  title={Bottom-Up Meta-Policy Search},
  author={Luckeciano Carvalho Melo and Marcos R.O.A. Maximo and Adilson Marques da Cunha},
  journal={ArXiv},
  year={2019},
  volume={abs/1910.10232}
}
Despite of the recent progress in agents that learn through interaction, there are several challenges in terms of sample efficiency and generalization across unseen behaviors during training. To mitigate these problems, we propose and apply a first-order Meta-Learning algorithm called Bottom-Up Meta-Policy Search (BUMPS), which works with two-phase optimization procedure: firstly, in a meta-training phase, it distills few expert policies to create a meta-policy capable of generalizing knowledge… 

Figures and Tables from this paper

Transformers are Meta-Reinforcement Learners

This work presents TrMRL, a meta-RL agent that mimics the memory reinstatement mechanism using the transformer architecture and shows that the self-attention computes a consensus representation that minimizes the Bayes Risk at each layer and provides meaningful features to compute the best actions.

Push Recovery Strategies through Deep Reinforcement Learning

  • D. C. MeloM. MaximoA. Cunha
  • Computer Science
    2020 Latin American Robotics Symposium (LARS), 2020 Brazilian Symposium on Robotics (SBR) and 2020 Workshop on Robotics in Education (WRE)
  • 2020
This work implements a Push Recovery controller that improves a walking engine used by a humanoid simulated agent from RoboCup 3D Soccer Simulation League, and achieves an expert policy, represented by a Deep Neural Network, which is compatible with a Zero Moment Point walking engine.

Learning Push Recovery Behaviors for Humanoid Walking Using Deep Reinforcement Learning

A implementation of a Push Recovery controller that improves the walking engine’s performance used by a simulated humanoid agent from RoboCup 3D Soccer Simulation League environment and proposes two approaches based on Transfer Learning and Imitation Learning to achieve a final policy which performs well across an wide range disturbance directions.

Deep Reinforcement Learning for Humanoid Robot Behaviors

This article uses a hierarchical controller where a model-free policy learns to interact model-based walking algorithm and uses DRL algorithms for an agent to learn how to perform humanoid robot behaviors: completing a racing track as fast as possible and dribbling against a single opponent.

Learning Push Recovery Behaviors for Humanoid Walking Using Deep Reinforcement Learning

A implementation of a Push Recovery controller that improves the walking engine’s performance used by a simulated humanoid agent from RoboCup 3D Soccer Simulation League environment and proposes two approaches based on Transfer Learning and Imitation Learning to achieve a final policy which performs well across an wide range disturbance directions.

Learning Humanoid Robot Running Motions with Symmetry Incentive through Proximal Policy Optimization

A methodology based on deep reinforcement learning to develop running skills in a humanoid robot with no prior knowledge and outperforms the state-of-the-art in terms of sprint speed by approximately 50%.

References

SHOWING 1-10 OF 34 REFERENCES

On First-Order Meta-Learning Algorithms

A family of algorithms for learning a parameter initialization that can be fine-tuned quickly on a new task, using only first-order derivatives for the meta-learning updates, including Reptile, which works by repeatedly sampling a task, training on it, and moving the initialization towards the trained weights on that task.

Divide-and-Conquer Reinforcement Learning

The results show that divide-and-conquer RL greatly outperforms conventional policy gradient methods on challenging grasping, manipulation, and locomotion tasks, and exceeds the performance of a variety of prior methods.

Evolved Policy Gradients

Empirical results show that the evolved policy gradient algorithm (EPG) achieves faster learning on several randomized environments compared to an off-the-shelf policy gradient method, and its learned loss can generalize to out-of-distribution test time tasks, and exhibits qualitatively different behavior from other popular metalearning algorithms.

Proximal Policy Optimization Algorithms

We propose a new family of policy gradient methods for reinforcement learning, which alternate between sampling data through interaction with the environment, and optimizing a "surrogate" objective

End-to-End Training of Deep Visuomotor Policies

This paper develops a method that can be used to learn policies that map raw image observations directly to torques at the robot's motors, trained using a partially observed guided policy search method, with supervision provided by a simple trajectory-centric reinforcement learning method.

Learning to reinforcement learn

This work introduces a novel approach to deep meta-reinforcement learning, which is a system that is trained using one RL algorithm, but whose recurrent dynamics implement a second, quite separate RL procedure.

Control of exploitation-exploration meta-parameter in reinforcement learning

One-Shot Imitation Learning

A meta-learning framework for achieving one-shot imitation learning, where ideally, robots should be able to learn from very few demonstrations of any given task, and instantly generalize to new situations of the same task, without requiring task-specific engineering.

Learning to Learn: Meta-Critic Networks for Sample Efficient Learning

A meta-critic approach to meta-learning is proposed: an action-value function neural network that learns to criticise any actor trying to solve any specified task in a trainable task-parametrised loss generator.

Meta-Learning with Temporal Convolutions

This work proposes a class of simple and generic meta-learner architectures, based on temporal convolutions, that is domain- agnostic and has no particular strategy or algorithm encoded into it and outperforms state-of-the-art methods that are less general and more complex.