Classes of Multiagent Q-learning Dynamics with epsilon-greedy Exploration


Q-learning in single-agent environments is known to converge in the limit given sufficient exploration. The same algorithm has been applied, with some success, in multiagent environments, where traditional analysis techniques break down. Using established dynamical systems methods, we derive and study an idealization of Q-learning in 2-player 2-action… (More)


5 Figures and Tables


Citations per Year

Citation Velocity: 22

Averaging 22 citations per year over the last 3 years.

Learn more about how we calculate this metric in our FAQ.