Exploration-Exploitation in Multi-Agent Learning: Catastrophe Theory Meets Game Theory

@article{Leonardos2021ExplorationExploitationIM,
  title={Exploration-Exploitation in Multi-Agent Learning: Catastrophe Theory Meets Game Theory},
  author={Stefanos Leonardos and Georgios Piliouras},
  journal={Artif. Intell.},
  year={2021},
  volume={304},
  pages={103653}
}

Exploration-Exploitation in Multi-Agent Competition: Convergence with Bounded Rationality

It is shown that Q-learning always converges to the unique quantal-response equilibrium (QRE), the standard solution concept for games under bounded rationality, in weighted zero-sum polymatrix games with heterogeneous learning agents using positive exploration rates.

The Dynamics of Q-learning in Population Games: a Physics-Inspired Continuity Equation Model

A new formal model is developed which always accurately describes the Q-learning dynamics in population games across different initial settings of MASs and game configurations and can be applied to different exploration mechanisms, describe the mean dynamics, and be extended to Q- learning in 2-player and n-player games.

Fast Convergence of Optimistic Gradient Ascent in Network Zero-Sum Extensive Form Games

This work represents an initial foray into the world of online learning dynamics in network extensive form games, proving that OGA results in both time-average and day-to-day convergence to the set of Nash Equilibria.

Independent Policy Gradient for Large-Scale Markov Potential Games: Sharper Rates, Function Approximation, and Game-Agnostic Convergence

To learn a Nash equilibrium of an MPG in which the size of state space and/or the number of players can be very large, new independent policy gradient algorithms are proposed that are run by all players in tandem.

Optimal No-Regret Learning in General Games: Bounded Regret with Unbounded Step-Sizes via Clairvoyant MWU

It is established that self-consistent mental models exist for any choice of step-sizes and provide bounds on the step-size under which their uniqueness and linear-time computation are guaranteed via contraction mapping arguments.

Balancing Collective Exploration and Exploitation in Multi-Agent and Multi-Robot Systems: A Review

This review summarizes and categorizes the methods used to control the level of exploration and exploitation carried out by an multi-agent systems, as well as the overall performance of a system with a given cooperative control algorithm.

Balancing Collective Exploration and Exploitation in Multi-Agent and Multi-Robot Systems: A Review

This review summarizes and categorizes the methods used to control the level of exploration and exploitation carried out by an multi-agent systems, as well as the overall performance of a system with a given cooperative control algorithm.

Adaptive Algorithms and Collusion via Coupling

The mechanism responsible for collusion between Artificial Intelligence algorithms documented by recent experimental evidence is uncovered, and spontaneous coupling between the algorithms’ estimates leads to periodic coordination on actions that are more profitable than static Nash equilibria.

Learning in Markets: Greed Leads to Chaos but Following the Price is Right

The findings suggest that by considering multi-agent interactions from a market rather than a game-theoretic perspective, natural learning protocols which are stable and converge to effective outcomes rather than being chaotic are formally derived.

Adaptive Algorithms, Tacit Collusion, and Design for Competition (cid:42)

It is proved that algorithms using counterfactual returns to inform their updates avoid this bias and converge to dominant strategies, sustaining collusive actions in the long run.

References

SHOWING 1-10 OF 66 REFERENCES

Individual Q-Learning in Normal Form Games

This work considers the behavior of value-based learning agents in the multi-agent multi-armed bandit problem, and shows that such agents cannot generally play at a Nash equilibrium, although if smooth best responses are used, a Nash distribution can be reached.

Frequency adjusted multi-agent Q-learning

Frequency Adjusted Q- (FAQ-) learning is proposed, a variation of Q-learning that perfectly adheres to the predictions of the evolutionary model for an arbitrarily large part of the policy space.

An Evolutionary Dynamical Analysis of Multi-Agent Learning in Iterated Games

It is shown how evolutionary dynamics from Evolutionary Game Theory can help the developer of a MAS in good choices of parameter settings of the used RL algorithms and how the improved results for MAS RL in COIN, and a developed extension, are predicted by the ED.

Complex dynamics in learning complicated games

This work investigates two-person games in which the players learn based on a type of reinforcement learning called experience-weighted attraction (EWA), and suggests that there is a large parameter regime for which complicated strategic interactions generate inherently unpredictable behavior that is best described in the language of dynamical systems theory.

Dynamics of Boltzmann Q learning in two-player two-action games.

It is demonstrated that, for certain games with a single NE, it is possible to have additional rest points that persist for a finite range of the exploration rates and disappear when the Exploration rates of both players tend to zero.

α-Rank: Multi-Agent Evaluation by Evolution

We introduce α-Rank, a principled evolutionary dynamics methodology, for the evaluation and ranking of agents in large-scale multi-agent interactions, grounded in a novel dynamical game-theoretic

A selection-mutation model for q-learning in multi-agent systems

This work shows how the Replicator Dynamics (RD) can be used as a model for Q-learning in games and reveals an interesting connection between the exploitation-exploration scheme from RL and the selection-mutation mechanisms from evolutionary game theory.

Evolutionary Dynamics of Regret Minimization

The evolutionary dynamics of the Regret Minimization polynomial weights learning algorithm are formally derived, which will be described by a system of differential equations that can easily investigate parameter settings and analyze the dynamics of multiple concurrently learning agents using regret minimization.

Penalty-Regulated Dynamics and Robust Learning Procedures in Games

A new class of continuous-time learning dynamics consisting of a replicator-like drift adjusted by a penalty term that renders the boundary of the game’s strategy space repelling unchanged and a discrete-time, payoff-based learning algorithm that retains these convergence properties and only requires players to observe their in-game payoffs is designed.

Learning in Games via Reinforcement and Regularization

This paper extends several properties of exponential learning, including the elimination of dominated strategies, the asymptotic stability of strict Nash equilibria, and the convergence of time-averaged trajectories in zero-sum games with an interior Nash equilibrium.
...