• Corpus ID: 2325395

Classifying Options for Deep Reinforcement Learning

@article{Arulkumaran2016ClassifyingOF,
  title={Classifying Options for Deep Reinforcement Learning},
  author={Kai Arulkumaran and Nat Dilokthanakul and Murray Shanahan and Anil Anthony Bharath},
  journal={ArXiv},
  year={2016},
  volume={abs/1604.08153}
}
In this paper we combine one method for hierarchical reinforcement learning - the options framework - with deep Q-networks (DQNs) through the use of different "option heads" on the policy network, and a supervisory network for choosing between the different options. We utilise our setup to investigate the effects of architectural constraints in subtasks with positive and negative transfer, across a range of network capacities. We empirically show that our augmented DQN has lower sample… 

Figures from this paper

Deep Reinforcement Learning With Macro-Actions
TLDR
This paper focuses on macro-actions, and evaluates these on different Atari 2600 games, where they yield significant improvements in learning speed and can even achieve better scores than DQN.
Hypothesis-Driven Skill Discovery for Hierarchical Deep Reinforcement Learning
TLDR
This work proposes the Hypothesis Proposal and Evaluation (HyPE) algorithm, which discovers objects from raw pixel data, generates hypotheses about the controllability of observed changes in object state, and learns a hierarchy of skills to test these hypotheses.
Towards Deep Symbolic Reinforcement Learning
TLDR
It is shown that the resulting system -- though just a prototype -- learns effectively, and, by acquiring a set of symbolic rules that are easily comprehensible to humans, dramatically outperforms a conventional, fully neural DRL system on a stochastic variant of the game.
A Deep Hierarchical Reinforcement Learning Algorithm in Partially Observable Markov Decision Processes
TLDR
This paper proposes a hierarchical deep reinforcement learning approach for learning in hierarchical POMDP in which the tasks have only partial observability and possess hierarchical properties and proposes the deep hierarchical RL algorithm.
A Brief Survey of Deep Reinforcement Learning
TLDR
This survey will cover central algorithms in deep reinforcement learning, including the deep Q-network, trust region policy optimisation, and asynchronous advantage actor-critic, and highlight the unique advantages of deep neural networks, focusing on visual understanding via reinforcement learning.
Deep Reinforcement Learning: A Brief Survey
TLDR
This survey will cover central algorithms in deep RL, including the deep Q-network (DQN), trust region policy optimization (TRPO), and asynchronous advantage actor critic, and highlight the unique advantages of deep neural networks, focusing on visual understanding via RL.
Combo-Action: Training Agent For FPS Game with Auxiliary Tasks
TLDR
A novel method which can plan on temporally-extended action sequences, which is referred as Combo-Action to compress the action space and outperforms previous state-of-the-art approaches by a large margin.
Integrating Multiple Policies for Person-Following Robot Training Using Deep Reinforcement Learning
TLDR
A DRL-based method for training an agent capable of selecting the appropriate policy for current state of the environment from a set of previously trained optimal policies for a given task which can be decomposed into other sub tasks is proposed.
Deep Reinforcement Learning Issues and Approaches for The Multi-Agent Centric Problems
TLDR
In this paper,Deep reinforcement learning algorithms and their applications are reviewed and categorized and the advantages and disadvantages of algorithms and the challenges that are solved with appearance of deep reinforcement learning are addressed.
Towards Lifelong Self-Supervision: A Deep Learning Direction for Robotics
TLDR
This manuscript surveys recent work in the literature that pertain to applying deep learning systems to the robotics domain, either as means of estimation or as a tool to resolve motor commands directly from raw percepts and suggests that deep learning as a tools alone is insufficient in building a unified framework to acquire general intelligence.
...
...

References

SHOWING 1-10 OF 28 REFERENCES
Policy Distillation
TLDR
A novel method called policy distillation is presented that can be used to extract the policy of a reinforcement learning agent and train a new network that performs at the expert level while being dramatically smaller and more efficient.
Actor-Mimic: Deep Multitask and Transfer Reinforcement Learning
TLDR
This work defines a novel method of multitask and transfer learning that enables an autonomous agent to learn how to behave in multiple tasks simultaneously, and then generalize its knowledge to new domains, and uses Atari games as a testing environment to demonstrate these methods.
Graying the black box: Understanding DQNs
TLDR
This paper is able to understand and describe the policies learned by DQNs for three different Atari2600 games and suggest ways to interpret, debug and optimize deep neural networks in reinforcement learning.
Deep Reinforcement Learning with Double Q-Learning
TLDR
This paper proposes a specific adaptation to the DQN algorithm and shows that the resulting algorithm not only reduces the observed overestimations, as hypothesized, but that this also leads to much better performance on several games.
Hierarchical Relative Entropy Policy Search
TLDR
This work defines the problem of learning sub-policies in continuous state action spaces as finding a hierarchical policy that is composed of a high-level gating policy to select the low-level sub-Policies for execution by the agent and treats them as latent variables which allows for distribution of the update information between the sub- policies.
Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation
TLDR
h-DQN is presented, a framework to integrate hierarchical value functions, operating at different temporal scales, with intrinsically motivated deep reinforcement learning, and allows for flexible goal specifications, such as functions over entities and relations.
End-to-End Training of Deep Visuomotor Policies
TLDR
This paper develops a method that can be used to learn policies that map raw image observations directly to torques at the robot's motors, trained using a partially observed guided policy search method, with supervision provided by a simple trajectory-centric reinforcement learning method.
Universal Value Function Approximators
TLDR
An efficient technique for supervised learning of universal value function approximators (UVFAs) V (s, g; θ) that generalise not just over states s but also over goals g is developed and it is demonstrated that a UVFA can successfully generalise to previously unseen goals.
Deep Exploration via Bootstrapped DQN
Efficient exploration in complex environments remains a major challenge for reinforcement learning. We propose bootstrapped DQN, a simple algorithm that explores in a computationally and
...
...