• Corpus ID: 17581979

Improving Scalability of Reinforcement Learning by Separation of Concerns

@article{Seijen2016ImprovingSO,
  title={Improving Scalability of Reinforcement Learning by Separation of Concerns},
  author={Harm van Seijen and Mehdi Fatemi and Joshua Romoff and Romain Laroche},
  journal={ArXiv},
  year={2016},
  volume={abs/1612.05159}
}
In this paper, we propose a framework for solving a single-agent task by using multiple agents, each focusing on different aspects of the task. This approach has two main advantages: 1) it allows for specialized agents for different parts of the task, and 2) it provides a new way to transfer knowledge, by transferring trained agents. Our framework generalizes the traditional hierarchical decomposition, in which, at any moment in time, a single agent has control until it has solved its… 

Figures from this paper

Algorithm Selection for Reinforcement Learning
TLDR
A novel meta-algorithm, called Epochal Stochastic Bandit Algorithm Selection (ESBAS), to freeze the policy updates at each epoch, and to leave a rebooted stochastic bandit in charge of the algorithm selection.
Adaptive Regret Minimization for Learning Complex Team-Based Tactics
TLDR
An approach and analysis for performing decentralized cooperative control of a team of decoys to achieve the Honeypot Ambush tactic and the numerical results verify the effectiveness of the proposed solution to achieve a global satisfaction outcome and to adapt to a wide spectrum of scenarios.
Algorithm selection of off-policy reinforcement learning algorithm
TLDR
The article presents a novel meta-algorithm, called Epochal Stochastic Bandit Algorithm Selection (ESBAS), to freeze the policy updates at each epoch, and to leave a rebooted stochastic bandit in charge of the algorithm selection.

References

SHOWING 1-10 OF 23 REFERENCES
The Option-Critic Architecture
TLDR
This work derives policy gradient theorems for options and proposes a new option-critic architecture capable of learning both the internal policies and the termination conditions of options, in tandem with the policy over options, and without the need to provide any additional rewards or subgoals.
Automatic Discovery of Subgoals in Reinforcement Learning using Diverse Density
TLDR
This paper presents a method by which a reinforcement learning agent can automatically discover certain types of subgoals online and is able to accelerate learning on the current task and to transfer its expertise to other, related tasks through the reuse of its ability to attainSubgoals.
Recent Advances in Hierarchical Reinforcement Learning
TLDR
This work reviews several approaches to temporal abstraction and hierarchical organization that machine learning researchers have recently developed and discusses extensions of these ideas to concurrent activities, multiagent coordination, and hierarchical memory for addressing partial observability.
Recent Advances in Hierarchical Reinforcement Learning
TLDR
This work reviews several approaches to temporal abstraction and hierarchical organization that machine learning researchers have recently developed and discusses extensions of these ideas to concurrent activities, multiagent coordination, and hierarchical memory for addressing partial observability.
Learning to Communicate with Deep Multi-Agent Reinforcement Learning
TLDR
By embracing deep neural networks, this work is able to demonstrate end-to-end learning of protocols in complex environments inspired by communication riddles and multi-agent computer vision problems with partial observability.
A Framework for Multi-Paradigmatic Learning
Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation
TLDR
h-DQN is presented, a framework to integrate hierarchical value functions, operating at different temporal scales, with intrinsically motivated deep reinforcement learning, and allows for flexible goal specifications, such as functions over entities and relations.
Multi-agent Reinforcement Learning: An Overview
TLDR
This chapter reviews a representative selection of multi-agent reinforcement learning algorithms for fully cooperative, fully competitive, and more general (neither cooperative nor competitive) tasks.
Q-Cut - Dynamic Discovery of Sub-goals in Reinforcement Learning
TLDR
The Q-Cut algorithm is presented, a graph theoretic approach for automatic detection of sub-goals in a dynamic environment, which is used for acceleration of the Q-Learning algorithm, and extended to the Segmented Q- cut algorithm, which uses previously identified bottlenecks for state space partitioning, necessary for finding additional bottlenECks in complex environments.
Identifying useful subgoals in reinforcement learning by local graph partitioning
We present a new subgoal-based method for automatically creating useful skills in reinforcement learning. Our method identifies subgoals by partitioning local state transition graphs---those that are
...
1
2
3
...