• Corpus ID: 12099821

Multi-Advisor Reinforcement Learning

  title={Multi-Advisor Reinforcement Learning},
  author={Romain Laroche and Mehdi Fatemi and Joshua Romoff and Harm van Seijen},
We consider tackling a single-agent RL problem by distributing it to $n$ learners. These learners, called advisors, endeavour to solve the problem from a different focus. Their advice, taking the form of action values, is then communicated to an aggregator, which is in control of the system. We show that the local planning method for the advisors is critical and that none of the ones found in the literature is flawless: the egocentric planning overestimates values of states where the other… 

Figures from this paper

A Meta-MDP Approach to Exploration for Lifelong Reinforcement Learning

It is shown that the search for an optimal exploration strategy can be formulated as a reinforcement learning problem itself and demonstrated that such strategy can leverage patterns found in the structure of related problems.

Options in Multi-task Reinforcement Learning - Transfer via Reflection

This work provides theoretical and empirical results demonstrating that when a set of landmark states covers the state space suitably, then a LOVR agent that learns optimal value functions for these in an initial phase and deploys the associated optimal policies as options in the main phase, can achieve a drastic reduction in cumulative regret compared to baseline approaches.

Q-Learning Acceleration via State-Space Partitioning

It is demonstrated in this paper that state-space partitioning among agents can be realized by reward design without hard coded rules, and it can be used to accelerate learning in both structured state domains and arbitrarily-structured state domains.

On Value Function Representation of Long Horizon Problems

The generalized Rademacher complexity of the hypothesis space of all optimal value functions is dependent on the planning horizon and independent of the state and action space size and bounds on the action-gaps of action value functions are presented.

Algorithm Selection for Reinforcement Learning

A novel meta-algorithm, called Epochal Stochastic Bandit Algorithm Selection (ESBAS), to freeze the policy updates at each epoch, and to leave a rebooted stochastic bandit in charge of the algorithm selection.

Value Function Decomposition for Iterative Design of Reinforcement Learning Agents

This paper shows how to integrate value decomposition into a broad class of actor-critic algorithms and use it to assist in the iterative agent-design process and provides several demonstrations of decomposition’s use in identifying and addressing problems in the design of both environments and agents.

On mechanisms for transfer using landmark value functions in multi-task lifelong reinforcement learning

A novel topological landmark covering confers beneficial theoretical results, bounding the Q values at each state-action pair and introduces a mechanism that performs action-pruning at infeasible actions which cannot possibly be part of an optimal policy for the current goal.

Danger-Aware Adaptive Composition of DRL Agents for Self-Navigation

A novel danger-aware adaptive composition (DAAC) framework is proposed to combine two individually DRL-trained agents, obstacle-avoidance and goal-reaching, to construct a navigation agent without any redesigning and retraining.

Multi-advisor deep reinforcement learning for smart home energy control

A multi-agent multi-advisor reinforcement learning system to handle the consumer’s time-varying preferences across objectives and identifies the need for stronger performance measures for a system of this type by considering the effect on agents of newly selected preferences.

In reinforcement learning, all objective functions are not equal

This work gets the reward back propagation out of the way by fitting directly a deep neural network on the analytically computed optimal value function, given a chosen objective function.



Separation of Concerns in Reinforcement Learning

This paper proposes a framework for solving a single-agent task by using multiple agents, each focusing on different aspects of the task, which generalizes the traditional hierarchical decomposition.

Feudal Reinforcement Learning

This paper shows how to create a Q-learning managerial hierarchy in which high level managers learning how to set tasks to their submanagers who, in turn, learn how to satisfy them.

Multi-Agent Reinforcement Learning:a critical survey

The recent work in AI on multi-agent reinforcement learning is surveyed and it is argued that, while exciting, this work is flawed; the fundamental flaw is unclarity about the problem or problems being addressed.

Off-Policy Reward Shaping with Ensembles

A PBRS framework that reduces learningspeed, but does not incur extra sample complexity is formulated and it is demonstrated empirically that an ensemble policy outperforms both the base policy, and its single-heuristic components, and an ensemble over a range of scales performs at least as well as one withoptimally tuned components.

Reinforcement Learning with Hierarchies of Machines

This work presents provably convergent algorithms for problem-solving and learning with hierarchical machines and demonstrates their effectiveness on a problem with several thousand states.

Q-Decomposition for Reinforcement Learning Agents

The paper explores a very simple agent design method called Q-decomposition, wherein a complex agent is built from simpler subagents. Each subagent has its own reward function and runs its own

Multiple-Goal Reinforcement Learning with Modular Sarsa(0)

A new algorithm, GM-Sarsa(O), for finding approximate solutions to multiple-goal reinforcement learning problems that are modeled as composite Markov decision processes, which finds good policies in the context of the composite task.

Ensemble Methods for Reinforcement Learning with Function Approximation

This paper proposes several ensemble methods to learn a combined parameterized state-value function of multiple agents and applies these methods to the simple pencil-and-paper game Tic-Tac-Toe and empirically shows that the learning speed is faster and the resulting policy is better than that of a single agent.

FeUdal Networks for Hierarchical Reinforcement Learning

We introduce FeUdal Networks (FuNs): a novel architecture for hierarchical reinforcement learning. Our approach is inspired by the feudal reinforcement learning proposal of Dayan and Hinton, and