• Corpus ID: 3562704

Learning by Playing - Solving Sparse Reward Tasks from Scratch

@article{Riedmiller2018LearningBP,
  title={Learning by Playing - Solving Sparse Reward Tasks from Scratch},
  author={Martin A. Riedmiller and Roland Hafner and Thomas Lampe and Michael Neunert and Jonas Degrave and Tom Van de Wiele and Volodymyr Mnih and Nicolas Manfred Otto Heess and Jost Tobias Springenberg},
  journal={ArXiv},
  year={2018},
  volume={abs/1802.10567}
}
We propose Scheduled Auxiliary Control (SAC-X), a new learning paradigm in the context of Reinforcement Learning (RL. [] Key Method The key idea behind our method is that active (learned) scheduling and execution of auxiliary policies allows the agent to efficiently explore its environment - enabling it to excel at sparse reward RL. Our experiments in several challenging robotic manipulation settings demonstrate the power of our approach.
NON-PARAMETRIC DISCRIMINATIVE REWARDS
TLDR
An unsupervised learning algorithm is presented to train agents to achieve perceptuallyspecified goals using only a stream of observations and actions and simultaneously learns a goal-conditioned policy and a goal achievement reward function that measures how similar a state is to the goal state.
Unsupervised Control Through Non-Parametric Discriminative Rewards
TLDR
An unsupervised learning algorithm to train agents to achieve perceptually-specified goals using only a stream of observations and actions, which leads to a co-operative game and a learned reward function that reflects similarity in controllable aspects of the environment instead of distance in the space of observations.
Deep Reinforcement Learning with Skill Library : Exploring with Temporal Abstractions and coarse approximate Dynamics Models
TLDR
The benefits, in terms of speed and accuracy, of the proposed approaches for a set of real world complex robotic manipulation tasks in which some state-of-the-art methods completely fail are demonstrated.
Learning from Guided Play: A Scheduled Hierarchical Approach for Improving Exploration in Adversarial Imitation Learning
TLDR
Learning from Guided Play is presented, a framework in which expert demonstrations of, in addition to a main task, multiple auxiliary tasks are leveraged, and the method compares favourably to supervised imitation learning and to a state-of-the-art AIL method.
BC + RL : Imitation Learning From Non-Optimal Demonstrations
TLDR
An algorithm is presented that allows for successful imitation learning from suboptimal demonstrations by combining behavioral cloning approaches with pure reinforcement learning to accelerate learning from sparse reward functions in robotic domains with long time horizons.
Multi-Task Reinforcement Learning without Interference
TLDR
This work develops a general approach that can change the multi-task optimization landscape to alleviate conflicting gradients across tasks and introduces two instantiations of this approach that prevent gradients for different tasks from interfering with one another.
Goal-constrained Sparse Reinforcement Learning for End-to-End Driving
TLDR
This work explores full-control driving with only goal-constrained sparse reward and proposes a curriculum learning approach for end-toend driving using only navigation view maps that benefit from small virtual-to-real domain gap.
Solving Compositional Reinforcement Learning Problems via Task Reduction
TLDR
Experimental results show that SIR can significantly accelerate and improve learning on a variety of challenging sparse-reward continuous-control problems with compositional structures.
Goal-conditioned Imitation Learning
TLDR
A novel algorithm goalGAIL is proposed, which incorporates demonstrations to drastically speed up the convergence to a policy able to reach any goal, surpassing the performance of an agent trained with other Imitation Learning algorithms.
Sparse Curriculum Reinforcement Learning for End-to-End Driving
TLDR
This work explores driving using only goal conditioned sparse rewards and proposes a curriculum learning approach for end to end drive using only navigation view maps that benefit from small virtual-to-real domain gap.
...
...

References

SHOWING 1-10 OF 56 REFERENCES
Overcoming Exploration in Reinforcement Learning with Demonstrations
TLDR
This work uses demonstrations to overcome the exploration problem and successfully learn to perform long-horizon, multi-step robotics tasks with continuous control such as stacking blocks with a robot arm.
Transfer in variable-reward hierarchical reinforcement learning
TLDR
This paper formally defines the transfer learning problem in the context of RL as learning an efficient algorithm to solve any SMDP drawn from a fixed distribution after experiencing a finite number of them, and introduces an online algorithm that compactly stores the optimal value functions for several SMDPs, and uses them to optimally initialize the value function for a new SMDP.
Reinforcement Learning with Unsupervised Auxiliary Tasks
TLDR
This paper significantly outperforms the previous state-of-the-art on Atari, averaging 880\% expert human performance, and a challenging suite of first-person, three-dimensional \emph{Labyrinth} tasks leading to a mean speedup in learning of 10$\times$ and averaging 87\% Expert human performance on Labyrinth.
Hindsight Experience Replay
TLDR
A novel technique is presented which allows sample-efficient learning from rewards which are sparse and binary and therefore avoid the need for complicated reward engineering and may be seen as a form of implicit curriculum.
Continuous control with deep reinforcement learning
TLDR
This work presents an actor-critic, model-free algorithm based on the deterministic policy gradient that can operate over continuous action spaces, and demonstrates that for many of the tasks the algorithm can learn policies end-to-end: directly from raw pixel inputs.
Unsupervised Perceptual Rewards for Imitation Learning
TLDR
This work presents a method that is able to identify key intermediate steps of a task from only a handful of demonstration sequences, and automatically identify the most discriminative features for identifying these steps.
Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates
TLDR
It is demonstrated that a recent deep reinforcement learning algorithm based on off-policy training of deep Q-functions can scale to complex 3D manipulation tasks and can learn deep neural network policies efficiently enough to train on real physical robots.
Leveraging Demonstrations for Deep Reinforcement Learning on Robotics Problems with Sparse Rewards
TLDR
A general and model-free approach for Reinforcement Learning on real robotics with sparse rewards built upon the Deep Deterministic Policy Gradient algorithm to use demonstrations that out-performs DDPG, and does not require engineered rewards.
The Option-Critic Architecture
TLDR
This work derives policy gradient theorems for options and proposes a new option-critic architecture capable of learning both the internal policies and the termination conditions of options, in tandem with the policy over options, and without the need to provide any additional rewards or subgoals.
Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation
TLDR
h-DQN is presented, a framework to integrate hierarchical value functions, operating at different temporal scales, with intrinsically motivated deep reinforcement learning, and allows for flexible goal specifications, such as functions over entities and relations.
...
...