Q-Mixing Network for Multi-Agent Pathfinding in Partially Observable Grid Environments

@inproceedings{Davydov2021QMixingNF,
  title={Q-Mixing Network for Multi-Agent Pathfinding in Partially Observable Grid Environments},
  author={Vasilii Davydov and Alexey Skrynnik and Konstantin S. Yakovlev and Aleksandr I. Panov},
  booktitle={Russian Conference on Artificial Intelligence},
  year={2021}
}
In this paper, we consider the problem of multi-agent navigation in partially observable grid environments. This problem is challenging for centralized planning approaches as they, typically, rely on the full knowledge of the environment. We suggest utilizing the reinforcement learning approach when the agents, first, learn the policies that map observations to actions and then follow these policies to reach their goals. To tackle the challenge associated with learning cooperative behavior, i.e… 

Pathfinding in stochastic environments: learning vs planning

This work proposes a stochastic formulation of the pathfinding problem, assuming that obstacles of arbitrary shapes may appear and disappear at random moments of time, and considers the case when the environment is only partially observable for an agent.

Efficient Policy Space Response Oracles

Theoretically, the solution procedures of EPSRO offer a monotonic improvement on the exploitability, which none of existing PSRO methods possess, and it is proved that the no-regret optimization has a regret bound of O (cid:112) T log [( k 2 + k ) / 2]) , where k is the size of restricted policy set.

References

SHOWING 1-10 OF 20 REFERENCES

PRIMAL: Pathfinding via Reinforcement and Imitation Multi-Agent Learning

PRIMAL is presented, a novel framework for MAPF that combines reinforcement and imitation learning to teach fully decentralized policies, where agents reactively plan paths online in a partially observable world while exhibiting implicit coordination.

Value-Decomposition Networks For Cooperative Multi-Agent Learning

This work addresses the problem of cooperative multi-agent reinforcement learning with a single joint reward signal by training individual agents with a novel value decomposition network architecture, which learns to decompose the team value function into agent-wise value functions.

Grid Path Planning with Deep Reinforcement Learning: Preliminary Results

Multi-agent Path Finding with Kinematic Constraints via Conflict Based Search

An extensive empirical evaluation is conducted in which the suggested algorithm is compared to the state-of-the-art MAPF planners and provides a clear evidence that the proposed method is as efficient as predecessor that is limited to translation-only action model.

Prioritized Multi-agent Path Finding for Differential Drive Robots

A set of modifications to the prominent prioritized planner - AA-SIPP(m) - aimed at lifting the most restrictive assumptions and providing robustness to the solutions are suggested, providing clear evidence that the algorithm scales well to large number of robots and is able to produce solutions that are safely executed by the robots prone to imperfect trajectory following.

Navigating Autonomous Vehicle at the Road Intersection Simulator with Reinforcement Learning

This paper proposes the implementation of a control system based on a trainable behavior generation module for an agent that simulates the behavior of an self-driving car when passing a road intersection together with other vehicles.

QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning

QMIX employs a network that estimates joint action-values as a complex non-linear combination of per-agent values that condition only on local observations, and structurally enforce that the joint-action value is monotonic in the per- agent values, which allows tractable maximisation of the jointaction-value in off-policy learning.

Suboptimal Variants of the Conflict-Based Search Algorithm for the Multi-Agent Pathfinding Problem

This work proposes several ways to relax the optimality conditions of CBS trading solution quality for runtime as well as bounded-suboptimal variants, where the returned solution is guaranteed to be within a constant factor from optimal solution cost.

Efficient SAT Approach to Multi-Agent Path Finding Under the Sum of Costs Objective

This paper presents the first SAT-solver for the sum-of-costs variant of MAPF which was previously only solved by search-based methods and is able to have a reasonable number of variables in its SAT encoding.