• Corpus ID: 59413765

Coordinating the Crowd: Inducing Desirable Equilibria in Non-Cooperative Systems

  title={Coordinating the Crowd: Inducing Desirable Equilibria in Non-Cooperative Systems},
  author={David Henry Mguni and Joel Jennings and Sergio Valcarcel Macua and Emilio Sison and Sofia Ceppi and Enrique Munoz de Cote},
Many real-world systems such as taxi systems, traffic networks and smart grids involve self-interested actors that perform individual tasks in a shared environment. However, in such systems, the self-interested behaviour of agents produces welfare inefficient and globally suboptimal outcomes that are detrimental to all - common examples are congestion in traffic networks, demand spikes for resources in electricity grids and over-extraction of environmental resources such as fisheries. We… 

Figures from this paper

Adaptive Incentive Design with Multi-Agent Meta-Gradient Reinforcement Learning
A model-free meta-gradient method to learn an adaptive incentive function in the context of multi-agent reinforcement learning that can induce selfish agents to learn near-optimal cooperative behavior and significantly outperform learning-oblivious baselines.
Learning to Share in Multi-Agent Reinforcement Learning
Inspired by the fact that sharing plays a key role in human’s learning of cooperation, LToS is proposed, a hierarchically decentralized MARL framework that enables agents to learn to dynamically share reward with neighbors so as to encourage agents to cooperate on the global objective.
Equilibrium Inverse Reinforcement Learning for Ride-hailing Vehicle Network
This work formulate the problem of passenger-vehicle matching in a sparsely connected graph and proposed an algorithm to derive an equilibrium policy in a multi-agent environment and developed a method to learn the driver’s reward function transferable to an environment with significantly different dynamics from training data.
End-to-End Learning and Intervention in Games
This paper cast the equilibria of games as individual layers and integrate them into an end-to-end optimization framework and proposes two approaches, respectively based on explicit and implicit differentiation based on the solutions to variational inequalities.
Social Contracts for Non-Cooperative Games
It is shown that for any game, a suitable social contract can be designed to produce an optimal outcome in terms of social welfare, and that, for any alternative moral objective, there are games for which no such social contract will be feasible that produces non-negligible social benefit compared to collective selfish behaviour.
Learning Expensive Coordination: An Event-Based Deep RL Approach
This work model the leader's decision-making process as a semi-Markov Decision Process and proposes a novel multi-agent event-based policy gradient to learn theLeader's long-term policy and proposes an action abstraction- based policy gradient algorithm to reduce the followers' decision space and thus accelerate the training process of followers.
Partially Observable Mean Field Reinforcement Learning
This paper introduces a Q-learning based algorithm that can learn effectively in large environments with many agents learning simultaneously to achieve possibly distinct goals and proves that this Qlearning estimate stays very close to the Nash Q-value for the first setting.
LIGS: Learnable Intrinsic-Reward Generation Selection for Multi-Agent Learning
A new general framework for improving coordination and performance of multi-agent reinforcement learners (MARL), named Learnable Intrinsic-Reward Generation Selection algorithm (LIGS), which introduces an adaptive learner, Generator that observes the agents and learns to construct intrinsic rewards online that coordinate the agents’ joint exploration and joint behaviour.
Inspired by the fact that sharing plays a key role in human’s learning of cooperation, LToS is proposed, a hierarchically decentralized MARL framework that enables agents to learn to dynamically share reward with neighbors so as to encourage agents to cooperate on the global objective through collectives.


Decentralised Learning in Systems with Many, Many Strategic Agents
This paper proposes a learning protocol that is guaranteed to converge to equilibrium policies even when the number of agents is extremely large, and shows convergence to Nash-equilibrium policies in applications from economics and control theory with thousands of strategically interacting agents.
Learning Parametric Closed-Loop Policies for Markov Potential Games
This work presents a closed-loop analysis for MPGs and considers parametric policies that depend on the current state and where agents adapt to stochastic transitions, and shows that a CL Nash equilibrium can be found by solving a related optimal control problem (OCP).
Dynamic Potential Games in Communications: Fundamentals and Applications
This work applies the analysis and provides numerical methods to solve four key example problems, with different features each: energy demand control in a smart-grid network, network flow optimization in which the relays have bounded link capacity and limited battery life, uplink multiple access communication with users that have to optimize the use of their batteries, and two optimal scheduling games with nonstationary channels.
Dynamic Potential Games With Constraints: Fundamentals and Applications in Communications
This work applies the analysis and provides numerical methods to solve four example problems, named dynamic potential games, whose solution can be found through a single multivariate optimal control problem.
Prosocial learning agents solve generalized Stag Hunts better than selfish ones
It is shown that making a single agent prosocial, that is, making them care about the rewards of their partners can increase the probability that groups converge to good outcomes, and experimentally shows that this result carries over to a variety of more complex environments with Stag Hunt-like dynamics including ones where agents must learn from raw input pixels.
Theoretical considerations of potential-based reward shaping for multi-agent systems
It is proven that the equivalence to Q-table initialisation remains and the Nash Equilibria of the underlying stochastic game are not modified, and it is demonstrated empirically that potential-based reward shaping affects exploration and, consequentially, can alter the joint policy converged upon.
Leader-Follower semi-Markov Decision Problems: Theoretical Framework and Approximate Solution
  • K. Tharakunnel, S. Bhattacharyya
  • Mathematics, Computer Science
    2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning
  • 2007
This work proposes a Markov decision process (MDP) framework for a class of dynamic leader-follower problems that have important applications and considers approximate solution of these problems using RL and demonstrates the solution approach in the special case where the followers' stochastic game is a repeated game.
Distributed Demand Management in Smart Grid with a Congestion Game
In this paper we propose distributed load management in smart grid infrastructures to control the power demand at peak hours, by means of dynamic pricing strategies. The distributed solution that we
Price of anarchy in transportation networks: efficiency and optimality control.
This simulation shows that uncoordinated drivers possibly waste a considerable amount of their travel time, and suggests that simply blocking certain streets can partially improve the traffic conditions.
Balancing Two-Player Stochastic Games with Soft Q-Learning
This paper enables tuneable behaviour by generalising soft Q-learning to stochastic games, where more than one agent interact strategically, and shows how tuning agents' constraints affect performance and how to reliably balance games with high-dimensional representations.