• Corpus ID: 238253461

Decentralized Graph-Based Multi-Agent Reinforcement Learning Using Reward Machines

  title={Decentralized Graph-Based Multi-Agent Reinforcement Learning Using Reward Machines},
  author={Jueming Hu and Zhe Xu and Weichang Wang and Guannan Qu and Yutian Pang and Yongming Liu},
In multi-agent reinforcement learning (MARL), it is challenging for a collection of agents to learn complex temporally extended tasks. The difficulties lie in computational complexity and how to learn the high-level ideas behind reward functions. We study the graph-based Markov Decision Process (MDP) where the dynamics of neighboring agents are coupled. We use a reward machine (RM) to encode each agent's task and expose reward function internal structures. RM has the capacity to describe high… 

Figures and Tables from this paper



Reward Machines for Cooperative Multi-Agent Reinforcement Learning

The proposed novel interpretation of RMs in the multi-agent setting explicitly encodes required teammate interdependencies and independencies, allowing the team-level task to be decomposed into sub-tasks for individual agents, and provides a natural approach to decentralized learning.

Fully Decentralized Multi-Agent Reinforcement Learning with Networked Agents

This work appears to be the first study of fully decentralized MARL algorithms for networked agents with function approximation, with provable convergence guarantees, and can be implemented in an online fashion.

Joint Inference of Reward Machines and Policies for Reinforcement Learning

An iterative algorithm that performs joint inference of reward machines and policies for RL (more specifically, q-learning) and it is proved that the proposed algorithm converges almost surely to an optimal policy in the limit.

Scalable Reinforcement Learning of Localized Policies for Multi-Agent Networked Systems

A Scalable Actor-Critic (SAC) framework is proposed that exploits the network structure and finds a localized policy that is a $O(\rho^\kappa)$-approximation of a stationary point of the objective for some $\rho\in(0,1)$ with complexity that scales with the local state-action space size of the largest $\kappa$-hop neighborhood of the network.

Learning Reward Machines for Partially Observable Reinforcement Learning

It is shown that RMs can be learned from experience, instead of being specified by the user, and that the resulting problem decomposition can be used to effectively solve partially observable RL problems.

Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments

An adaptation of actor-critic methods that considers action policies of other agents and is able to successfully learn policies that require complex multi-agent coordination is presented.

Learning Non-Markovian Reward Models in MDPs

The approach is a careful combination of the Angluin's L* active learning algorithm to learn finite automata, testing techniques for establishing conformance of finite model hypothesis and optimisation techniques for computing optimal strategies in Markovian (immediate) reward MDPs.

Reward Machines: Exploiting Reward Function Structure in Reinforcement Learning

This paper proposes reward machines, a type of finite state machine that supports the specification of reward functions while exposing reward function structure, and describes different methodologies to exploit this structure to support learning, including automated reward shaping, task decomposition, and counterfactual reasoning with off-policy learning.

Using Reward Machines for High-Level Task Specification and Decomposition in Reinforcement Learning

Q-Learning for Reward Machines is presented, an algorithm which appropriately decomposes the reward machine and uses off-policy q-learning to simultaneously learn subpolicies for the different components and is guaranteed to converge to an optimal policy in the tabular case.