# Coordinating the Crowd: Inducing Desirable Equilibria in Non-Cooperative Systems

@inproceedings{Mguni2019CoordinatingTC, title={Coordinating the Crowd: Inducing Desirable Equilibria in Non-Cooperative Systems}, author={David Henry Mguni and Joel Jennings and Sergio Valcarcel Macua and Emilio Sison and Sofia Ceppi and Enrique Munoz de Cote}, booktitle={AAMAS}, year={2019} }

Many real-world systems such as taxi systems, traffic networks and smart grids involve self-interested actors that perform individual tasks in a shared environment. However, in such systems, the self-interested behaviour of agents produces welfare inefficient and globally suboptimal outcomes that are detrimental to all - common examples are congestion in traffic networks, demand spikes for resources in electricity grids and over-extraction of environmental resources such as fisheries. We…

## 19 Citations

Adaptive Incentive Design with Multi-Agent Meta-Gradient Reinforcement Learning

- EconomicsAAMAS
- 2022

A model-free meta-gradient method to learn an adaptive incentive function in the context of multi-agent reinforcement learning that can induce selfish agents to learn near-optimal cooperative behavior and significantly outperform learning-oblivious baselines.

Learning to Share in Multi-Agent Reinforcement Learning

- Computer ScienceArXiv
- 2021

Inspired by the fact that sharing plays a key role in human’s learning of cooperation, LToS is proposed, a hierarchically decentralized MARL framework that enables agents to learn to dynamically share reward with neighbors so as to encourage agents to cooperate on the global objective.

Equilibrium Inverse Reinforcement Learning for Ride-hailing Vehicle Network

- Computer ScienceWWW
- 2021

This work formulate the problem of passenger-vehicle matching in a sparsely connected graph and proposed an algorithm to derive an equilibrium policy in a multi-agent environment and developed a method to learn the driver’s reward function transferable to an environment with significantly different dynamics from training data.

End-to-End Learning and Intervention in Games

- EconomicsNeurIPS
- 2020

This paper cast the equilibria of games as individual layers and integrate them into an end-to-end optimization framework and proposes two approaches, respectively based on explicit and implicit differentiation based on the solutions to variational inequalities.

Social Contracts for Non-Cooperative Games

- EconomicsAIES
- 2020

It is shown that for any game, a suitable social contract can be designed to produce an optimal outcome in terms of social welfare, and that, for any alternative moral objective, there are games for which no such social contract will be feasible that produces non-negligible social benefit compared to collective selfish behaviour.

Learning Expensive Coordination: An Event-Based Deep RL Approach

- Computer ScienceICLR
- 2020

This work model the leader's decision-making process as a semi-Markov Decision Process and proposes a novel multi-agent event-based policy gradient to learn theLeader's long-term policy and proposes an action abstraction- based policy gradient algorithm to reduce the followers' decision space and thus accelerate the training process of followers.

Partially Observable Mean Field Reinforcement Learning

- Computer ScienceAAMAS
- 2021

This paper introduces a Q-learning based algorithm that can learn effectively in large environments with many agents learning simultaneously to achieve possibly distinct goals and proves that this Qlearning estimate stays very close to the Nash Q-value for the first setting.

LIGS: Learnable Intrinsic-Reward Generation Selection for Multi-Agent Learning

- Computer ScienceArXiv
- 2021

A new general framework for improving coordination and performance of multi-agent reinforcement learners (MARL), named Learnable Intrinsic-Reward Generation Selection algorithm (LIGS), which introduces an adaptive learner, Generator that observes the agents and learns to construct intrinsic rewards online that coordinate the agents’ joint exploration and joint behaviour.

Reward Design for Driver Repositioning Using Multi-Agent Reinforcement Learning

- Computer ScienceArXiv
- 2020

L EARNING TO S HARE IN M ULTI -A GENT R EINFORCE MENT L EARNING

- Computer Science
- 2022

Inspired by the fact that sharing plays a key role in human’s learning of cooperation, LToS is proposed, a hierarchically decentralized MARL framework that enables agents to learn to dynamically share reward with neighbors so as to encourage agents to cooperate on the global objective through collectives.

## References

SHOWING 1-10 OF 33 REFERENCES

Decentralised Learning in Systems with Many, Many Strategic Agents

- Computer ScienceAAAI
- 2018

This paper proposes a learning protocol that is guaranteed to converge to equilibrium policies even when the number of agents is extremely large, and shows convergence to Nash-equilibrium policies in applications from economics and control theory with thousands of strategically interacting agents.

Learning Parametric Closed-Loop Policies for Markov Potential Games

- Computer ScienceICLR
- 2018

This work presents a closed-loop analysis for MPGs and considers parametric policies that depend on the current state and where agents adapt to stochastic transitions, and shows that a CL Nash equilibrium can be found by solving a related optimal control problem (OCP).

Dynamic Potential Games in Communications: Fundamentals and Applications

- Computer ScienceArXiv
- 2015

This work applies the analysis and provides numerical methods to solve four key example problems, with different features each: energy demand control in a smart-grid network, network flow optimization in which the relays have bounded link capacity and limited battery life, uplink multiple access communication with users that have to optimize the use of their batteries, and two optimal scheduling games with nonstationary channels.

Dynamic Potential Games With Constraints: Fundamentals and Applications in Communications

- Computer ScienceIEEE Transactions on Signal Processing
- 2016

This work applies the analysis and provides numerical methods to solve four example problems, named dynamic potential games, whose solution can be found through a single multivariate optimal control problem.

Prosocial learning agents solve generalized Stag Hunts better than selfish ones

- EconomicsAAMAS
- 2018

It is shown that making a single agent prosocial, that is, making them care about the rewards of their partners can increase the probability that groups converge to good outcomes, and experimentally shows that this result carries over to a variety of more complex environments with Stag Hunt-like dynamics including ones where agents must learn from raw input pixels.

Theoretical considerations of potential-based reward shaping for multi-agent systems

- Computer ScienceAAMAS
- 2011

It is proven that the equivalence to Q-table initialisation remains and the Nash Equilibria of the underlying stochastic game are not modified, and it is demonstrated empirically that potential-based reward shaping affects exploration and, consequentially, can alter the joint policy converged upon.

Leader-Follower semi-Markov Decision Problems: Theoretical Framework and Approximate Solution

- Mathematics, Computer Science2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning
- 2007

This work proposes a Markov decision process (MDP) framework for a class of dynamic leader-follower problems that have important applications and considers approximate solution of these problems using RL and demonstrates the solution approach in the special case where the followers' stochastic game is a repeated game.

Distributed Demand Management in Smart Grid with a Congestion Game

- Engineering2010 First IEEE International Conference on Smart Grid Communications
- 2010

In this paper we propose distributed load management in smart grid infrastructures to control the power demand at peak hours, by means of dynamic pricing strategies. The distributed solution that we…

Price of anarchy in transportation networks: efficiency and optimality control.

- EconomicsPhysical review letters
- 2008

This simulation shows that uncoordinated drivers possibly waste a considerable amount of their travel time, and suggests that simply blocking certain streets can partially improve the traffic conditions.

Balancing Two-Player Stochastic Games with Soft Q-Learning

- Computer ScienceIJCAI
- 2018

This paper enables tuneable behaviour by generalising soft Q-learning to stochastic games, where more than one agent interact strategically, and shows how tuning agents' constraints affect performance and how to reliably balance games with high-dimensional representations.