Improving Scalability of Reinforcement Learning by Separation of Concerns
@article{Seijen2016ImprovingSO, title={Improving Scalability of Reinforcement Learning by Separation of Concerns}, author={Harm van Seijen and Mehdi Fatemi and Joshua Romoff and Romain Laroche}, journal={ArXiv}, year={2016}, volume={abs/1612.05159} }
In this paper, we propose a framework for solving a single-agent task by using multiple agents, each focusing on different aspects of the task. This approach has two main advantages: 1) it allows for specialized agents for different parts of the task, and 2) it provides a new way to transfer knowledge, by transferring trained agents. Our framework generalizes the traditional hierarchical decomposition, in which, at any moment in time, a single agent has control until it has solved its…
3 Citations
Algorithm Selection for Reinforcement Learning
- Computer Science
- 2017
A novel meta-algorithm, called Epochal Stochastic Bandit Algorithm Selection (ESBAS), to freeze the policy updates at each epoch, and to leave a rebooted stochastic bandit in charge of the algorithm selection.
Adaptive Regret Minimization for Learning Complex Team-Based Tactics
- Computer ScienceIEEE Access
- 2019
An approach and analysis for performing decentralized cooperative control of a team of decoys to achieve the Honeypot Ambush tactic and the numerical results verify the effectiveness of the proposed solution to achieve a global satisfaction outcome and to adapt to a wide spectrum of scenarios.
Algorithm selection of off-policy reinforcement learning algorithm
- Computer ScienceArXiv
- 2017
The article presents a novel meta-algorithm, called Epochal Stochastic Bandit Algorithm Selection (ESBAS), to freeze the policy updates at each epoch, and to leave a rebooted stochastic bandit in charge of the algorithm selection.
References
SHOWING 1-10 OF 23 REFERENCES
The Option-Critic Architecture
- Computer ScienceAAAI
- 2017
This work derives policy gradient theorems for options and proposes a new option-critic architecture capable of learning both the internal policies and the termination conditions of options, in tandem with the policy over options, and without the need to provide any additional rewards or subgoals.
Automatic Discovery of Subgoals in Reinforcement Learning using Diverse Density
- Computer ScienceICML
- 2001
This paper presents a method by which a reinforcement learning agent can automatically discover certain types of subgoals online and is able to accelerate learning on the current task and to transfer its expertise to other, related tasks through the reuse of its ability to attainSubgoals.
Recent Advances in Hierarchical Reinforcement Learning
- Computer ScienceDiscret. Event Dyn. Syst.
- 2003
This work reviews several approaches to temporal abstraction and hierarchical organization that machine learning researchers have recently developed and discusses extensions of these ideas to concurrent activities, multiagent coordination, and hierarchical memory for addressing partial observability.
Recent Advances in Hierarchical Reinforcement Learning
- Computer ScienceDiscret. Event Dyn. Syst.
- 2003
This work reviews several approaches to temporal abstraction and hierarchical organization that machine learning researchers have recently developed and discusses extensions of these ideas to concurrent activities, multiagent coordination, and hierarchical memory for addressing partial observability.
Learning to Communicate with Deep Multi-Agent Reinforcement Learning
- Computer ScienceNIPS
- 2016
By embracing deep neural networks, this work is able to demonstrate end-to-end learning of protocols in complex environments inspired by communication riddles and multi-agent computer vision problems with partial observability.
Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation
- Computer ScienceNIPS
- 2016
h-DQN is presented, a framework to integrate hierarchical value functions, operating at different temporal scales, with intrinsically motivated deep reinforcement learning, and allows for flexible goal specifications, such as functions over entities and relations.
Multi-agent Reinforcement Learning: An Overview
- Computer Science
- 2010
This chapter reviews a representative selection of multi-agent reinforcement learning algorithms for fully cooperative, fully competitive, and more general (neither cooperative nor competitive) tasks.
Q-Cut - Dynamic Discovery of Sub-goals in Reinforcement Learning
- Computer ScienceECML
- 2002
The Q-Cut algorithm is presented, a graph theoretic approach for automatic detection of sub-goals in a dynamic environment, which is used for acceleration of the Q-Learning algorithm, and extended to the Segmented Q- cut algorithm, which uses previously identified bottlenecks for state space partitioning, necessary for finding additional bottlenECks in complex environments.
Identifying useful subgoals in reinforcement learning by local graph partitioning
- Computer ScienceICML
- 2005
We present a new subgoal-based method for automatically creating useful skills in reinforcement learning. Our method identifies subgoals by partitioning local state transition graphs---those that are…