• Corpus ID: 238583203

Multi-Agent MDP Homomorphic Networks

@article{Pol2022MultiAgentMH,
  title={Multi-Agent MDP Homomorphic Networks},
  author={Elise van der Pol and Herke van Hoof and Frans A. Oliehoek and Max Welling},
  journal={ArXiv},
  year={2022},
  volume={abs/2110.04495}
}
This paper introduces Multi-Agent MDP Homomorphic Networks, a class of networks that allows distributed execution using only local information, yet is able to share experience between global symmetries in the joint state-action space of cooperative multi-agent systems. In cooperative multi-agent systems, complex symmetries arise between different configurations of the agents and their local observations. For example, consider a group of agents navigating: rotating the state globally results in a… 

Figures and Tables from this paper

Equivariant Networks for Zero-Shot Coordination

A novel equivariant network architecture is presented for use in Dec-POMDPs that prevents the agent from learning policies which break symmetries, doing so more effectively than prior methods and is used to improve on the state of the art for zero-shot coordination on the Hanabi benchmark.

Breaking the Curse of Dimensionality in Multiagent State Space: A Unified Agent Permutation Framework

A unified agent permutation framework that exploits the permutation invariance (PI) and permutation equivariance (PE) inductive biases to reduce the multiagent state space.

Continual Reinforcement Learning with Group Symmetries

This work introduces a PPO-based RL algorithm with an invariant feature extractor and a novel task grouping mechanism based on invariant features and proposes a novel continual RL framework with group symmetries, which grows a policy for each group of equivalent tasks instead of a single task.

Towards Applicable State Abstractions: a Preview in Strategy Games

An overview of related studies of state abstraction is given and strategy games are proposed to become a prior platform to address open problems and study the application of domain-independent state abstraction.

Equivariant Reinforcement Learning for Quadrotor UAV

An equivariance property of the quadrotor dynamics is identified such that the dimension of the state required in the training is reduced by one, thereby improving the sampling efficiency of reinforcement learning substantially.

Maximum Class Separation as Inductive Bias in One Matrix

This paper proposes a simple alternative to maximum separation as an inductive bias in the network by adding one matrix multiplication before computing the softmax activations, and illustrates that out-of-distribution and open-set recognition benefit from an embedded maximum separation.

Continuous MDP Homomorphisms and Homomorphic Policy Gradient

It is rigorously proved that performing HPG on the abstract MDP is equivalent to performing the deterministic policy gradient (DPG) on the actual MDP, and it is proved that continuous MDP homomorphisms preserve value functions, which in turn enables their use for policy evaluation.

References

SHOWING 1-10 OF 45 REFERENCES

MDP Homomorphic Networks: Group Symmetries in Reinforcement Learning

This paper introduces MDP homomorphic networks for deep reinforcement learning and introduces an easy method for constructing equivariant network layers numerically, so the system designer need not solve the constraints by hand, as is typically done.

Value-Decomposition Networks For Cooperative Multi-Agent Learning

This work addresses the problem of cooperative multi-agent reinforcement learning with a single joint reward signal by training individual agents with a novel value decomposition network architecture, which learns to decompose the team value function into agent-wise value functions.

Multiagent Planning with Factored MDPs

This work presents a principled and efficient planning algorithm for cooperative multiagent dynamic systems that avoids the exponential blowup in the state and action space and is an efficient alternative to more complicated algorithms even in the single agent case.

Collaborative Multiagent Reinforcement Learning by Payoff Propagation

A set of scalable techniques for learning the behavior of a group of agents in a collaborative multiagent setting using the framework of coordination graphs of Guestrin, Koller, and Parr (2002a) and introduces different model-free reinforcement-learning techniques, unitedly called Sparse Cooperative Q-learning, which approximate the global action-value function based on the topology of a coordination graph.

Deep Coordination Graphs

It is shown that DCG can solve challenging predator-prey tasks that are vulnerable to the relative overgeneralization pathology and in which all other known value factorization approaches fail.

Planning, Learning and Coordination in Multiagent Decision Processes

The extent to which methods from single-agent planning and learning can be applied in multiagent settings is investigated and the decomposition of sequential decision processes so that coordination can be learned locally, at the level of individual states.

PIC: Permutation Invariant Critic for Multi-Agent Deep Reinforcement Learning

This work proposes a 'permutation invariant critic' (PIC), which yields identical output irrespective of the agent permutation, which enables the model to scale to 30 times more agents and to achieve improvements of test episode reward between 15% to 50% on the challenging multi-agent particle environment (MPE).

Decentralized Stochastic Planning with Anonymity in Interactions

This paper introduces a general model model called D-SPAIT to capture anonymity in interactions, and provides optimization based optimal and local-optimal solutions for generalizable sub-categories of D -SPAIT.

Symmetry in Markov Decision Processes and its Implications for Single Agent and Multiagent Learning

It is proved that if an MDP possesses a symmetry, then the optimal value function andQ function are similarly symmetric and there exists a symmetric optimal policy.

"Other-Play" for Zero-Shot Coordination

This work introduces a novel learning algorithm called other-play (OP), that enhances self-play by looking for more robust strategies, exploiting the presence of known symmetries in the underlying problem.