Corpus ID: 235658380

Last-iterate Convergence in Extensive-Form Games

@article{Lee2021LastiterateCI,
  title={Last-iterate Convergence in Extensive-Form Games},
  author={Chung-wei Lee and Christian Kroer and Haipeng Luo},
  journal={ArXiv},
  year={2021},
  volume={abs/2106.14326}
}
Regret-based algorithms are highly efficient at finding approximate Nash equilibria in sequential games such as poker games. However, most regret-based algorithms, including counterfactual regret minimization (CFR) and its variants, rely on iterate averaging to achieve convergence. Inspired by recent advances on lastiterate convergence of optimistic algorithms in zero-sum normal-form games, we study this phenomenon in sequential games, and provide a comprehensive study of last-iterate… Expand

Figures from this paper

References

SHOWING 1-10 OF 51 REFERENCES
Optimistic Regret Minimization for Extensive-Form Games via Dilated Distance-Generating Functions
TLDR
It is shown that when the goal is minimizing regret, rather than computing a Nash equilibrium, the optimistic methods can outperform CFR+, even in deep game trees, and this decomposition mirrors the structure of the counterfactual regret minimization framework. Expand
Solving Imperfect-Information Games via Discounted Regret Minimization
TLDR
This paper introduces novel CFR variants that 1) discount regrets from earlier iterations in various ways, 2) reweight iterations inVarious ways to obtain the output strategies, 3) use a non-standard regret minimizer and/or 4) leverage "optimistic regret matching". Expand
Smoothing Method for Approximate Extensive-Form Perfect Equilibrium
TLDR
A smoothing approach for behavioral perturbations of the convex polytope that encompasses the strategy spaces of players in an extensive-form game is developed, which enables one to compute an approximate variant of extensive- form perfect equilibria. Expand
Monte Carlo Sampling for Regret Minimization in Extensive Games
TLDR
A general family of domain-independent CFR sample-based algorithms called Monte Carlo counterfactual regret minimization (MCCFR) is described, of which the original and poker-specific versions are special cases. Expand
Fast Convergence of Regularized Learning in Games
We show that natural classes of regularized learning algorithms with a form of recency bias achieve faster convergence rates to approximate efficiency and to coarse correlated equilibria inExpand
Stable-Predictive Optimistic Counterfactual Regret Minimization
TLDR
This work presents the first CFR variant that breaks the square-root dependence on iterations, and shows that this method is faster than the original CFR algorithm, although not as fast as newer variants, in spite of their worst-case $O(T^{-1/2})$ dependence on iteration. Expand
Increasing Iterate Averaging for Solving Saddle-Point Problems
TLDR
It is shown that increasing averaging schemes, applied to various first-order methods, are able to preserve the $O(1/T)$ convergence rate with no additional assumptions or computational overhead. Expand
Last-Iterate Convergence: Zero-Sum Games and Constrained Min-Max Optimization
TLDR
It is shown that OMWU monotonically improves the Kullback-Leibler divergence of the current iterate to the (appropriately normalized) min-max solution until it enters a neighborhood of the solution and becomes a contracting map converging to the exact solution. Expand
Last iterate convergence in no-regret learning: constrained min-max optimization for convex-concave landscapes
TLDR
This work shows that "Optimistic Multiplicative-Weights Update (OMWU)" which follows the no-regret online learning framework, exhibits last iterate convergence locally for convex-concave games, generalizing the results of DP19 where last iterates convergence of OMWU was shown only for the \textit{bilinear case}. Expand
Optimization, Learning, and Games with Predictable Sequences
TLDR
It is proved that a version of Optimistic Mirror Descent can be used by two strongly-uncoupled players in a finite zero-sum matrix game to converge to the minimax equilibrium at the rate of O((log T)/T). Expand
...
1
2
3
4
5
...