Multi-Advisor Reinforcement Learning
@article{Laroche2017MultiAdvisorRL, title={Multi-Advisor Reinforcement Learning}, author={Romain Laroche and Mehdi Fatemi and Joshua Romoff and Harm van Seijen}, journal={ArXiv}, year={2017}, volume={abs/1704.00756} }
We consider tackling a single-agent RL problem by distributing it to $n$ learners. These learners, called advisors, endeavour to solve the problem from a different focus. Their advice, taking the form of action values, is then communicated to an aggregator, which is in control of the system. We show that the local planning method for the advisors is critical and that none of the ones found in the literature is flawless: the egocentric planning overestimates values of states where the other…
15 Citations
A Meta-MDP Approach to Exploration for Lifelong Reinforcement Learning
- Computer ScienceAAMAS
- 2019
It is shown that the search for an optimal exploration strategy can be formulated as a reinforcement learning problem itself and demonstrated that such strategy can leverage patterns found in the structure of related problems.
Options in Multi-task Reinforcement Learning - Transfer via Reflection
- Computer ScienceCanadian Conference on AI
- 2019
This work provides theoretical and empirical results demonstrating that when a set of landmark states covers the state space suitably, then a LOVR agent that learns optimal value functions for these in an initial phase and deploys the associated optimal policies as options in the main phase, can achieve a drastic reduction in cumulative regret compared to baseline approaches.
Q-Learning Acceleration via State-Space Partitioning
- Computer Science2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA)
- 2018
It is demonstrated in this paper that state-space partitioning among agents can be realized by reward design without hard coded rules, and it can be used to accelerate learning in both structured state domains and arbitrarily-structured state domains.
On Value Function Representation of Long Horizon Problems
- Computer ScienceAAAI
- 2018
The generalized Rademacher complexity of the hypothesis space of all optimal value functions is dependent on the planning horizon and independent of the state and action space size and bounds on the action-gaps of action value functions are presented.
Algorithm Selection for Reinforcement Learning
- Computer Science
- 2017
A novel meta-algorithm, called Epochal Stochastic Bandit Algorithm Selection (ESBAS), to freeze the policy updates at each epoch, and to leave a rebooted stochastic bandit in charge of the algorithm selection.
Value Function Decomposition for Iterative Design of Reinforcement Learning Agents
- Computer ScienceArXiv
- 2022
This paper shows how to integrate value decomposition into a broad class of actor-critic algorithms and use it to assist in the iterative agent-design process and provides several demonstrations of decomposition’s use in identifying and addressing problems in the design of both environments and agents.
On mechanisms for transfer using landmark value functions in multi-task lifelong reinforcement learning
- Computer ScienceArXiv
- 2019
A novel topological landmark covering confers beneficial theoretical results, bounding the Q values at each state-action pair and introduces a mechanism that performs action-pruning at infeasible actions which cannot possibly be part of an optimal policy for the current goal.
Danger-Aware Adaptive Composition of DRL Agents for Self-Navigation
- Computer ScienceUnmanned Syst.
- 2021
A novel danger-aware adaptive composition (DAAC) framework is proposed to combine two individually DRL-trained agents, obstacle-avoidance and goal-reaching, to construct a navigation agent without any redesigning and retraining.
Multi-advisor deep reinforcement learning for smart home energy control
- Computer Science
A multi-agent multi-advisor reinforcement learning system to handle the consumer’s time-varying preferences across objectives and identifies the need for stronger performance measures for a system of this type by considering the effect on agents of newly selected preferences.
In reinforcement learning, all objective functions are not equal
- Computer ScienceICLR
- 2018
This work gets the reward back propagation out of the way by fitting directly a deep neural network on the analytically computed optimal value function, given a chosen objective function.
References
SHOWING 1-10 OF 41 REFERENCES
Separation of Concerns in Reinforcement Learning
- Computer Science
- 2016
This paper proposes a framework for solving a single-agent task by using multiple agents, each focusing on different aspects of the task, which generalizes the traditional hierarchical decomposition.
Feudal Reinforcement Learning
- BusinessNIPS
- 1992
This paper shows how to create a Q-learning managerial hierarchy in which high level managers learning how to set tasks to their submanagers who, in turn, learn how to satisfy them.
Multi-Agent Reinforcement Learning:a critical survey
- Computer Science
- 2003
The recent work in AI on multi-agent reinforcement learning is surveyed and it is argued that, while exciting, this work is flawed; the fundamental flaw is unclarity about the problem or problems being addressed.
Off-Policy Reward Shaping with Ensembles
- Computer ScienceArXiv
- 2015
A PBRS framework that reduces learningspeed, but does not incur extra sample complexity is formulated and it is demonstrated empirically that an ensemble policy outperforms both the base policy, and its single-heuristic components, and an ensemble over a range of scales performs at least as well as one withoptimally tuned components.
Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning
- Computer ScienceArtif. Intell.
- 1999
Reinforcement Learning with Hierarchies of Machines
- Computer ScienceNIPS
- 1997
This work presents provably convergent algorithms for problem-solving and learning with hierarchical machines and demonstrates their effectiveness on a problem with several thousand states.
Q-Decomposition for Reinforcement Learning Agents
- Computer ScienceICML
- 2003
The paper explores a very simple agent design method called Q-decomposition, wherein a complex agent is built from simpler subagents. Each subagent has its own reward function and runs its own…
Multiple-Goal Reinforcement Learning with Modular Sarsa(0)
- Computer ScienceIJCAI
- 2003
A new algorithm, GM-Sarsa(O), for finding approximate solutions to multiple-goal reinforcement learning problems that are modeled as composite Markov decision processes, which finds good policies in the context of the composite task.
Ensemble Methods for Reinforcement Learning with Function Approximation
- Computer ScienceMCS
- 2011
This paper proposes several ensemble methods to learn a combined parameterized state-value function of multiple agents and applies these methods to the simple pencil-and-paper game Tic-Tac-Toe and empirically shows that the learning speed is faster and the resulting policy is better than that of a single agent.
FeUdal Networks for Hierarchical Reinforcement Learning
- Computer ScienceICML
- 2017
We introduce FeUdal Networks (FuNs): a novel architecture for hierarchical reinforcement learning. Our approach is inspired by the feudal reinforcement learning proposal of Dayan and Hinton, and…