• Corpus ID: 2529037

Efficient Nash equilibrium approximation through Monte Carlo counterfactual regret minimization

  title={Efficient Nash equilibrium approximation through Monte Carlo counterfactual regret minimization},
  author={Michael Bradley Johanson and Nolan Bard and Marc Lanctot and Richard G. Gibson and Michael Bowling},
Recently, there has been considerable progress towards algorithms for approximating Nash equilibrium strategies in extensive games. [] Key Method By sampling only the public chance outcomes seen by all players, we take advantage of the imperfect information structure of the game to (i) avoid recomputation of strategy probabilities, and (ii) achieve an algorithmic speed improvement, performing O(n2) work at terminal nodes in O(n) time. We demonstrate that this new CFR update converges more quickly than chance…

Figures from this paper

Solving imperfect-information games via exponential counterfactual regret minimization
This paper proposes a novel CFR based method, exponential counterfactual regret minimization, and presents an exponential reduction technique for regret in the process of the iteration, and proves that the method ECFR has a good theoretical guarantee of convergence.
Online Monte Carlo Counterfactual Regret Minimization for Search in Imperfect Information Games
It is shown that OOS can overcome the problem of non-locality encountered by previous search algorithms and perform well against its worst-case opponents and that preexisting Information Set Monte Carlo tree search (ISMCTS) can get more exploitable over time.
Search in Imperfect Information Games Using Online Monte Carlo Counterfactual Regret Minimization
This paper presents Online Outcome Sampling (OOS), the first imperfect information search algorithm that is guaranteed to converge to an equilibrium strategy in two-player zero-sum games and shows that unlike with Information Set Monte Carlo Tree Search (ISMCTS), the exploitability of the strategies produced by OOS decreases as the amount of search time increases.
Using Regret Estimation to Solve Games Compactly
It is suggested that such abstractions can be largely subsumed by a regressor on game features that estimates regret during CFR, and the regressor essentially becomes a tunable, compact, and dynamic abstraction of abstractions.
Deep Counterfactual Regret Minimization
Deep Counterfactual Regret Minimization is introduced, a form of CFR that obviates the need for abstraction by instead using deep neural networks to approximate the behavior of CFR in the full game.
Near-Optimal Learning of Extensive-Form Games with Imperfect Information
This paper resolves the open question of designing near-optimal algorithms for learning imperfectinformation extensive-form games from bandit feedback. We present the first line of algorithms that
Computing Strong Game-Theoretic Strategies and Exploiting Suboptimal Opponents in Large Games
This work proposes a new paradigm in which relevant portions of the game are solved in real time in much finer degrees of granularity than the abstract game which is solved offline, enabling us to solve games with significantly less abstraction for the initial betting rounds.
Solving Large Imperfect Information Games Using CFR+
CFR$^+$, a new algorithm that typically outperforms the previously known algorithms by an order of magnitude or more in terms of computation time while also potentially requiring less memory.
Combining No-regret and Q-learning
A simple algorithm is introduced, local no-regret learning (LONR), which uses a Q-learning-like update rule to allow learning without terminal states or perfect recall, and it is proved that it achieves last iterate convergence in the basic case of MDPs.
Variance Reduction in Monte Carlo Counterfactual Regret Minimization (VR-MCCFR) for Extensive Form Games using Baselines
A variance reduction technique (VR-MCCFR) that applies to any sampling variant of Monte Carlo Counterfactual Regret Minimization, and it is shown that given a perfect baseline, the variance of the value estimates can be reduced to zero.


Monte Carlo Sampling for Regret Minimization in Extensive Games
A general family of domain-independent CFR sample-based algorithms called Monte Carlo counterfactual regret minimization (MCCFR) is described, of which the original and poker-specific versions are special cases.
Fast algorithms for finding randomized strategies in game trees
This paper describes a new representation of strategies which results in a practical linear formulation of the problem of two-player games with perfect recall (i.e., games where players never forget anything, which is a standard assumption).
Accelerating Best Response Calculation in Large Extensive Games
This paper details a general technique for best response computations that can often avoid a full game tree traversal and applies this approach to computing the worst-case performance of a number of strategies in heads-up limit Texas hold'em, which, prior to this work, was not possible.
Potential-Aware Automated Abstraction of Sequential Games, and Holistic Equilibrium Analysis of Texas Hold'em Poker
This paper is, to the knowledge, the first to abstract and game-theoretically analyze all four betting rounds in one run of Texas Hold'em poker (rather than splitting the game into phases).
Smoothing Techniques for Computing Nash Equilibria of Sequential Games
This work develops first-order smoothing techniques for saddle-point problems that arise in finding a Nash equilibrium of sequential games and introduces heuristics that significantly speed up the algorithm, and decomposed game representations that reduce the memory requirements, enabling the application of the techniques to drastically larger games.
A simple adaptive procedure leading to correlated equilibrium
We propose a new and simple adaptive procedure for playing a game: ‘‘regret-matching.’’ In this procedure, players may depart from their current play with probabilities that are proportional to
A Course in Game Theory
A Course in Game Theory presents the main ideas of game theory at a level suitable for graduate students and advanced undergraduates, emphasizing the theory's foundations and interpretations of its
Regret Minimization in Games with Incomplete Information
It is shown how minimizing counterfactual regret minimizes overall regret, and therefore in self-play can be used to compute a Nash equilibrium, and is demonstrated in the domain of poker, showing it can solve abstractions of limit Texas Hold'em with as many as 1012 states, two orders of magnitude larger than previous methods.