Curriculum goal masking for continuous deep reinforcement learning

  title={Curriculum goal masking for continuous deep reinforcement learning},
  author={Manfred Eppe and Sven Magg and Stefan Wermter},
  journal={2019 Joint IEEE 9th International Conference on Development and Learning and Epigenetic Robotics (ICDL-EpiRob)},
  • Manfred Eppe, S. Magg, S. Wermter
  • Published 17 September 2018
  • Computer Science
  • 2019 Joint IEEE 9th International Conference on Development and Learning and Epigenetic Robotics (ICDL-EpiRob)
Deep reinforcement learning has recently gained a focus on problems where policy or value functions are based on universal value function approximators (UVFAs) which renders them independent of goals. Evidence exists that the sampling of goals has a strong effect on the learning performance, and the problem of optimizing the goal sampling is frequently tackled with intrinsic motivation methods. However, there is a lack of general mechanisms that focus on goal sampling in the context of deep… 

Figures from this paper

Automatic Curriculum Learning For Deep RL: A Short Survey

The ambition of this work is to present a compact and accessible introduction to the Automatic Curriculum Learning literature and to draw a bigger picture of the current state of the art in ACL to encourage the cross-breeding of existing concepts and the emergence of new ideas.

Skill-based curiosity for intrinsically motivated reinforcement learning

This work proposes a novel end-to-end curiosity mechanism for deep reinforcement learning methods, that allows an agent to gradually acquire new skills and compares the performance of an augmented agent that uses the authors' curiosity reward to state-of-the-art learners.

From Semantics to Execution: Integrating Action Planning With Reinforcement Learning for Robotic Causal Problem-Solving

The paper demonstrates how the reward-sparsity can serve as a bridge between the high-level and low-level state- and action spaces and demonstrate that the integrated method is able to solve robotic tasks that involve non-trivial causal dependencies under noisy conditions, exploiting both data and knowledge.

Follow the Object: Curriculum Learning for Manipulation Tasks with Imagined Goals

The proposed algorithm, Follow the Object (FO), has been evaluated on 7 MuJoCo environments requiring increasing degree of exploration, and has achieved higher success rates compared to alternative algorithms.

Solving Robotic Manipulation with Sparse Reward Reinforcement Learning via Graph-Based Diversity and Proximity

—In multi-goal reinforcement learning (RL), al- gorithms usually suffer from inefficiency in the collection of successful experiences in tasks with sparse rewards. By utilizing the ideas of

Complex Robotic Manipulation via Graph-Based Hindsight Goal Generation

Graph-based hindsight goal generation (G-HGG), an extension of HGG selecting hindsight goals based on shortest distances in an obstacle-avoiding graph, which is a discrete representation of the environment is proposed.

A Conceptual Framework for Externally-influenced Agents: An Assisted Reinforcement Learning Review

This work proposes a conceptual framework and taxonomy for assisted reinforcement learning, aimed at fostering such collaboration by classifying and comparing various methods that use external information in the learning process.

Curiosity-Driven Multi-Criteria Hindsight Experience Replay

This work is the first to stack more than two blocks using only sparse reward without human demonstrations, and presents a method that combines hindsight with curiosity-driven exploration and curriculum learning in order to solve the challenging sparse-reward block stacking task.

Reinforcement Learning with Time-dependent Goals for Robotic Musicians

This paper addresses robotic musicianship by introducing a temporal extension to goal-conditioned reinforcement learning: Time-dependent goals, and demonstrates that these can be used to train a robotic musician to play the theremin instrument.

Curriculum Learning: A Survey

This survey shows how limits have been tackled in the literature, and presents curriculum learning instantiations for various tasks in machine learning, and constructs a multi-perspective clustering algorithm, linking the discovered clusters with the taxonomy.



Accelerating Deep Continuous Reinforcement Learning through Task Simplification

This work proposes a novel method for accelerating the learning process by task simplification inspired by the Goldilocks effect known from developmental psychology, and describes modifications to the DDPG algorithm with regard to the replay buffer to prevent artifacts during theLearning process from the simplified learning instances while maintaining the speed of learning.

Human-level control through deep reinforcement learning

This work bridges the divide between high-dimensional sensory inputs and actions, resulting in the first artificial agent that is capable of learning to excel at a diverse array of challenging tasks.

Universal Value Function Approximators

An efficient technique for supervised learning of universal value function approximators (UVFAs) V (s, g; θ) that generalise not just over states s but also over goals g is developed and it is demonstrated that a UVFA can successfully generalise to previously unseen goals.

Reverse Curriculum Generation for Reinforcement Learning

This work proposes a method to learn goal-oriented tasks without requiring any prior knowledge other than obtaining a single state in which the task is achieved, and generates a curriculum of start states that adapts to the agent's performance, leading to efficient training on goal- oriented tasks.

Hindsight Experience Replay

A novel technique is presented which allows sample-efficient learning from rewards which are sparse and binary and therefore avoid the need for complicated reward engineering and may be seen as a form of implicit curriculum.

Continuous control with deep reinforcement learning

This work presents an actor-critic, model-free algorithm based on the deterministic policy gradient that can operate over continuous action spaces, and demonstrates that for many of the tasks the algorithm can learn policies end-to-end: directly from raw pixel inputs.

From semantics to execution: Integrating action planning with reinforcement learning for robotic tool use

It is demonstrated that the integrated neuro-symbolic method is able to solve object manipulation problems that involve tool use and non-trivial causal dependencies under noisy conditions, exploiting both data and knowledge.

Accuracy-based Curriculum Learning in Deep Reinforcement Learning

It is shown that adaptive selection of accuracy requirements, based on a local measure of competence progress, automatically generates a curriculum where difficulty progressively increases, resulting in a better learning efficiency than sampling randomly.

Source Task Creation for Curriculum Learning

This paper presents the more ambitious problem of curriculum learning in reinforcement learning, in which the goal is to design a sequence of source tasks for an agent to train on, such that final performance or learning speed is improved.

Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping

Conditions under which modi cations to the reward function of a Markov decision process preserve the op timal policy are investigated to shed light on the practice of reward shap ing a method used in reinforcement learn ing whereby additional training rewards are used to guide the learning agent.