The Dynamics of Q-learning in Population Games: a Physics-Inspired Continuity Equation Model

  title={The Dynamics of Q-learning in Population Games: a Physics-Inspired Continuity Equation Model},
  author={Shuyue Hu and Chin-wing Leung and Ho-fung Leung and Harold Soh},
Although learning has found wide application in multi-agent systems, its effects on the temporal evolution of a system are far from understood. This paper focuses on the dynamics of Q-learning in large-scale multi-agent systems modeled as population games. We revisit the replicator equation model for Q-learning dynamics and observe that this model is inappropriate for our concerned setting. Motivated by this, we develop a new formal model, which bears a formal connection with the continuityโ€ฆย 

Figures and Tables from this paper



Population Games and Deterministic Evolutionary Dynamics

Population Games And Evolutionary Dynamics

  • W. Sandholm
  • Economics
    Economic learning and social evolution
  • 2010
The text first considers population games, which provide a simple, powerful model for studying strategic interactions among large numbers of anonymous agents, and studies the dynamics of behavior in these games, providing foundations for two distinct approaches to aggregate behavior dynamics.

Frequency adjusted multi-agent Q-learning

Frequency Adjusted Q- (FAQ-) learning is proposed, a variation of Q-learning that perfectly adheres to the predictions of the evolutionary model for an arbitrarily large part of the policy space.

Modelling the Dynamics of Multiagent Q-Learning in Repeated Symmetric Games: a Mean Field Theoretic Approach

This paper studies an n-agent setting with n tends to infinity, such that agents learn their policies concurrently over repeated symmetric bimatrix games with some other agents, and derives a Fokker-Planck equation that describes the evolution of the probability distribution of Q-values in the agent population.

Catastrophe by Design in Population Games: Destabilizing Wasteful Locked-in Technologies

This paper develops and analyzes a mechanism that induces transitions from inefficient lock-ins to superior alternatives and is shown to be structurally robust to significant and even adversarially chosen perturbations to the parameters of both the game and the behavioral model.

A selection-mutation model for q-learning in multi-agent systems

This work shows how the Replicator Dynamics (RD) can be used as a model for Q-learning in games and reveals an interesting connection between the exploitation-exploration scheme from RL and the selection-mutation mechanisms from evolutionary game theory.

Coupled replicator equations for the dynamics of learning in multiagent systems.

This work derives coupled replicator equations that describe the dynamics of collective learning in multiagent systems and shows that, although agents model their environment in a self-interested way without sharing knowledge, a game dynamics emerges naturally through environment-mediated interactions.

A common gradient in multi-agent reinforcement learning

Inherent similarities are demonstrated between two diverse families of algorithms of multi-agent reinforcement learning by comparing their underlying learning dynamics, derived as the continuous time limit of their policy updates.

A Theoretical Framework for Large-Scale Human-Robot Interaction with Groups of Learning Agents

The model-based Theoretical Human-Robot Scenarios (THuS) framework is introduced, capable of elucidating the interactions between large groups of humans and learning robots, and its application to a human-robot variant of the n-player coordination game is considered.