• Corpus ID: 14717992

Reinforcement Learning with Unsupervised Auxiliary Tasks

  title={Reinforcement Learning with Unsupervised Auxiliary Tasks},
  author={Max Jaderberg and Volodymyr Mnih and Wojciech M. Czarnecki and Tom Schaul and Joel Z. Leibo and David Silver and Koray Kavukcuoglu},
Deep reinforcement learning agents have achieved state-of-the-art results by directly maximising cumulative reward. However, environments contain a much wider variety of possible training signals. In this paper, we introduce an agent that also maximises many other pseudo-reward functions simultaneously by reinforcement learning. All of these tasks share a common representation that, like unsupervised learning, continues to develop in the absence of extrinsic rewards. We also introduce a novel… 

Figures from this paper

Learning Embodied Agents with Scalably-Supervised Reinforcement Learning
This thesis considers alternative modalities of supervision that can be more scalable and easier to provide from the human user and shows that such supervision can drastically improve the agent’s learning efficiency, enabling the agent to do directed exploration and learning within a large search space of states.
Hybrid Reward Architecture for Reinforcement Learning
A new method is proposed, called Hybrid Reward Architecture (HRA), which takes as input a decomposed reward function and learns a separate value function for each component reward function, enabling more effective learning.
Improving On-policy Learning with Statistical Reward Accumulation
An effective characterization of past reward statistics (which can be seen as long-term feedback signals) are introduced to supplement this immediate reward feedback and a novel exploration mechanism called "hot-wiring" that can give a boost to seemingly trapped agents is introduced.
Self-Supervised Dueling Networks for Deep Reinforcement Learning
This paper examines the possibilities of using self-supervised signals as auxiliary rewards and proposes a 2 stream architecture that represents state features that an agent can interact with and modify while the uncontrollable stream represents the state features from the environment that the agent has no control of.
Residual Reinforcement Learning from Demonstrations
Examination on simulated manipulation tasks demonstrates that residual RL from demonstrations is able to generalize to unseen environment conditions more flexibly than either behavioral cloning or RL fine-tuning, and is capable of solving high-dimensional, sparse-reward tasks out of reach for RL from scratch.
Dealing with Sparse Rewards in Reinforcement Learning
  • J. Hare
  • Computer Science, Psychology
  • 2019
This project introduces a novel reinforcement learning solution by combining aspects of two existing state of the art sparse reward solutions, curiosity driven exploration and unsupervised auxiliary tasks.
Shared Learning : Enhancing Reinforcement in $Q$-Ensembles
The Shared Learning framework aimed at making $Q-ensemble algorithms data-efficient and can help in speeding up the learning process in $Q$-ensembles with minimum computational overhead on a suite of Atari 2600 Games is proposed.
MERL: Multi-Head Reinforcement Learning
The proposed MERL, a general framework for structuring reinforcement learning by injecting problem knowledge into policy gradient updates, is introduced and defined, and the multi-head reinforcement learning framework used throughout this work is introduced.
Weakly-Supervised Reinforcement Learning for Controllable Behavior
This work introduces a framework for using weak supervision to automatically disentangle this semantically meaningful subspace of tasks from the enormous space of nonsensical "chaff" tasks, and shows that this learned subspace enables efficient exploration and provides a representation that captures distance between states.
URLB: Unsupervised Reinforcement Learning Benchmark
The Unsupervised Reinforcement Learning Benchmark (URLB) is introduced, providing twelve continuous control tasks from three domains for evaluation and open-source code for eight leading unsupervised RL methods and finds that the implemented baselines make progress but are not able to solve URLB.


Human-level control through deep reinforcement learning
This work bridges the divide between high-dimensional sensory inputs and actions, resulting in the first artificial agent that is capable of learning to excel at a diverse array of challenging tasks.
Dueling Network Architectures for Deep Reinforcement Learning
This paper presents a new neural network architecture for model-free reinforcement learning that leads to better policy evaluation in the presence of many similar-valued actions and enables the RL agent to outperform the state-of-the-art on the Atari 2600 domain.
Deep Successor Reinforcement Learning
DSR is presented, which generalizes Successor Representations within an end-to-end deep reinforcement learning framework and has several appealing properties including: increased sensitivity to distal reward changes due to factorization of reward and world dynamics, and the ability to extract bottleneck states given successor maps trained under a random policy.
Playing FPS Games with Deep Reinforcement Learning
This paper presents the first architecture to tackle 3D environments in first-person shooter games, that involve partially observable states, and substantially outperforms built-in AI agents of the game as well as average humans in deathmatch scenarios.
Asynchronous Methods for Deep Reinforcement Learning
A conceptually simple and lightweight framework for deep reinforcement learning that uses asynchronous gradient descent for optimization of deep neural network controllers and shows that asynchronous actor-critic succeeds on a wide variety of continuous motor control problems as well as on a new task of navigating random 3D mazes using a visual input.
Memory Approaches to Reinforcement Learning in Non-Markovian Domains
This paper studies three connectionist approaches which learn to use history to handle perceptual aliasing: the window-Q, recurrent- Q, and recurrent-model architectures.
Horde: a scalable real-time architecture for learning knowledge from unsupervised sensorimotor interaction
Results using Horde on a multi-sensored mobile robot to successfully learn goal-oriented behaviors and long-term predictions from off-policy experience are presented.
Playing Atari with Deep Reinforcement Learning
This work presents the first deep learning model to successfully learn control policies directly from high-dimensional sensory input using reinforcement learning, which outperforms all previous approaches on six of the games and surpasses a human expert on three of them.
Universal Value Function Approximators
An efficient technique for supervised learning of universal value function approximators (UVFAs) V (s, g; θ) that generalise not just over states s but also over goals g is developed and it is demonstrated that a UVFA can successfully generalise to previously unseen goals.
Learning to Navigate in Complex Environments
This work considers jointly learning the goal-driven reinforcement learning problem with auxiliary depth prediction and loop closure classification tasks and shows that data efficiency and task performance can be dramatically improved by relying on additional auxiliary tasks leveraging multimodal sensory inputs.