Limiting dynamics for Q-learning with memory one in two-player, two-action games

  title={Limiting dynamics for Q-learning with memory one in two-player, two-action games},
  author={Janusz M. Meylahn},
  journal={Computation Theory eJournal},
  • J. Meylahn
  • Published 2021
  • Mathematics, Physics
  • Computation Theory eJournal
We develop a computational method to identify all pure strategy equilibrium points in the strategy space of the two-player, two-action repeated games played by Q-learners with one period memory. In order to approximate the dynamics of these Q-learners, we construct a graph of pure strategy mutual best-responses. We apply this method to the iterated prisoner’s dilemma and find that there are exactly three absorbing states. By analyzing the graph for various values of the discount factor, we find… Expand


Classes of Multiagent Q-learning Dynamics with epsilon-greedy Exploration
This work derives and studies an idealization of Q-learning in 2-player 2-action repeated general-sum games, and addresses the discontinuous case of e-greedy exploration and uses it as a proxy for value-based algorithms to highlight a contrast with existing results in policy search. Expand
Symmetric equilibrium of multi-agent reinforcement learning in repeated prisoner's dilemma
It is found that the Win-stay Lose-shift strategy, the Grim strategy, and the strategy which always defects can form symmetric equilibrium of the mutual reinforcement learning process amongst sixteen deterministic strategies. Expand
Learning dynamics in social dilemmas
  • M. Macy, A. Flache
  • Sociology, Economics
  • Proceedings of the National Academy of Sciences of the United States of America
  • 2002
The Nash equilibrium, the main solution concept in analytical game theory, cannot make precise predictions about the outcome of repeated mixed-motive games. Nor can it tell us much about the dynamicsExpand
Q-learning agents in a Cournot oligopoly model
QQ-learning is a reinforcement learning model from the field of artificial intelligence. We study the use of QQ-learning for modeling the learning behavior of firms in repeated Cournot oligopolyExpand
Formalizing Multi-state Learning Dynamics
Results show that piecewise replicator dynamics qualitatively approximate multi-agent reinforcement learning in stochastic games. Expand
Deterministic limit of temporal difference reinforcement learning for stochastic games
This work presents a methodological extension, separating the interaction from the adaptation timescale, to derive the deterministic limit of a general class of reinforcement learning algorithms, called temporal difference learning, which is equipped to function in more realistic multistate environments. Expand
Reinforcement Learning Dynamics in the Infinite Memory Limit
This paper proposes a data-inefficient batch-learning algorithm for temporal difference Q learning and shows that it converges to a recently proposed deterministic limit of temporal difference reinforcement learning. Expand
This paper presents and proves in detail a convergence theorem forQ-learning based on that outlined in Watkins (1989), showing that Q-learning converges to the optimum action-values with probability 1 so long as all actions are repeatedly sampled in all states and the action- values are represented discretely. Expand
Cooperative Multi-Agent Learning: The State of the Art
This survey attempts to draw from multi-agent learning work in a spectrum of areas, including RL, evolutionary computation, game theory, complex systems, agent modeling, and robotics, and finds that this broad view leads to a division of the work into two categories. Expand
Incorporating Fairness into Game Theory and Economics
People like to help those who are helping them and to hurt those who are hurting them. Outcomes rejecting such motivations are called fairness equilibria. Outcomes are mutual-max when each personExpand