• Corpus ID: 10931318

Online Monte Carlo Counterfactual Regret Minimization for Search in Imperfect Information Games

@inproceedings{Lis2015OnlineMC,
  title={Online Monte Carlo Counterfactual Regret Minimization for Search in Imperfect Information Games},
  author={V. Lis{\'y} and Marc Lanctot and Michael Bowling},
  booktitle={AAMAS},
  year={2015}
}
Online search in games has been a core interest of artificial intelligence. [] Key Result In head-to-head play, OOS outperforms ISMCTS in games where non-locality plays a significant role, given a sufficient computation time per move.

Figures from this paper

Monte Carlo Continual Resolving for Online Strategy Computation in Imperfect Information Games

TLDR
A domain-independent formulation of CR applicable to any two-player zero-sum extensive-form games (EFGs) and an empirical comparison of MCCR with incremental tree building to Online Outcome Sampling and Information-set MCTS on several domains is presented.

Sound Algorithms in Imperfect Information Games

TLDR
It is argued that the fixedstrategy definitions of exploitability and epsilon-Nash equilibria are ill suited to measure the worst-case performance of an online algorithm, and the definition of soundness and the consistency hierarchy are formalized.

A Fast-Convergence Method of Monte Carlo Counterfactual Regret Minimization for Imperfect Information Dynamic Games

TLDR
Semi-OS, a fast-convergence method developed from Outcome-Sampling MCCF R (OS), the most popular variant of MCCFR, is introduced and it is shown that, by selecting an appropriate discount rate, Semi-OS not only significantly speeds up the convergence rate in Leduc Poker but also statistically outperforms OS in head-to-head matches of Leduc poker, a common testbed of imperfect information games.

Sound Search in Imperfect Information Games

TLDR
The definition of soundness and the consistency hierarchy finally provide appropriate tools to analyze online algorithms in imperfect information games.

Near-Optimal Learning of Extensive-Form Games with Imperfect Information

This paper resolves the open question of designing near-optimal algorithms for learning imperfectinformation extensive-form games from bandit feedback. We present the first line of algorithms that

Analysis of Hannan consistent selection for Monte Carlo tree search in simultaneous move games

TLDR
An additional property of HC algorithms is defined, which is sufficient to guarantee the convergence without the averaging and it is empirically shown that commonly used HC algorithms have this property.

Monte Carlo Neural Fictitious Self-Play: Approach to Approximate Nash equilibrium of Imperfect-Information Games

TLDR
The proposed Monte Carlo Neural Fictitious Self Play (MC-NFSP), an algorithm combines Monte Carlo tree search with NFSP, which greatly improves the performance on large-scale zero-sum imperfect-information games.

Smooth UCT Search in Computer Poker

TLDR
Smooth UCT is introduced, a variant of the established Upper Confidence Bounds Applied to Trees algorithm that outperformed UCT in Limit Texas Hold'em and won 3 silver medals in the 2014 Annual Computer Poker Competition.

Approximate exploitability: Learning a best response in large games

TLDR
A new metric, approximate exploitability, is introduced that calculates an analogous metric to exploitability using an approximate best response and can consistently find exploits for weak policies in large games, showing results on Chess, Go, Heads-up No Limit Texas Hold'em, and other games.

CFR-MIX: Solving Imperfect Information Extensive-Form Games with Combinatorial Action Space

TLDR
This work proposes a new strategy representation that represents a joint action strategy using individual strategies of all agents and a consistency relationship to maintain the cooperation between agents and introduces the new algorithm CFR-MIX which employs a mixing layer to estimate cumulative regret values of joint actions as a non-linear combination of cumulative regretvalues of individual actions.
...

References

SHOWING 1-10 OF 40 REFERENCES

Efficient Nash equilibrium approximation through Monte Carlo counterfactual regret minimization

TLDR
This work presents new sampling techniques that consider sets of chance outcomes during each traversal to produce slower, more accurate iterations of Counterfactual Regret Minimization, and demonstrates that this new CFR update converges more quickly than chance-sampled CFR in the large domains of poker and Bluff.

Monte Carlo Sampling for Regret Minimization in Extensive Games

TLDR
A general family of domain-independent CFR sample-based algorithms called Monte Carlo counterfactual regret minimization (MCCFR) is described, of which the original and poker-specific versions are special cases.

Computing Approximate Nash Equilibria and Robust Best-Responses Using Sampling

TLDR
Monte-Carlo Restricted Nash Response (MCRNR), a sample-based algorithm for the computation of restricted Nash strategies that are robust bestresponse strategies that exploit non-NE opponents more than playing a NE and are not (overly) exploitable by other strategies.

Recursive Monte Carlo search for imperfect information games

  • T. FurtakM. Buro
  • Computer Science
    2013 IEEE Conference on Computational Inteligence in Games (CIG)
  • 2013
TLDR
RecPIMC - a recursive IIMC search variant based on perfect information evaluation - performs considerably better than PIMC search in a large class of synthetic imperfect information games and the popular card game of Skat, for which PIMc search is the state-of-the-art cardplay algorithm.

Solving Imperfect Information Games Using Decomposition

TLDR
This work presents the first technique for decomposing an imperfect information game into subgames that can be solved independently, while retaining optimality guarantees on the full-game solution, and presents an algorithm for subgame solving which guarantees performance in the whole game, in contrast to existing methods which may have unbounded error.

Regret Minimization in Games with Incomplete Information

TLDR
It is shown how minimizing counterfactual regret minimizes overall regret, and therefore in self-play can be used to compute a Nash equilibrium, and is demonstrated in the domain of poker, showing it can solve abstractions of limit Texas Hold'em with as many as 1012 states, two orders of magnitude larger than previous methods.

Convergence of Monte Carlo Tree Search in Simultaneous Move Games

TLDR
It is formally proved that if a selection method is e-Hannan consistent in a matrix game and satisfies additional requirements on exploration, then the MCTS algorithm eventually converges to an approximate Nash equilibrium of the extensive-form game.

Monte Carlo Tree Search in Simultaneous Move Games with Applications to Goofspiel

TLDR
This paper discusses the adaptation of MCTS to simultaneous move games, and introduces a new algorithm, Online Outcome Sampling (OOS), that approaches a Nash equilibrium strategy over time.

Self-play Monte-Carlo tree search in computer poker

TLDR
This paper introduces a variant of the established UCB algorithm and provides first results demonstrating its ability to find approximate Nash equilibria in self- play Monte-Carlo Tree Search in limit Texas Hold'em and Kuhn poker.

Finding Optimal Strategies for Imperfect Information Games

TLDR
These algorithms theoretically and experimentally are compared using both simple game trees and a large database of problems from the game of Bridge, showing that the new algorithms both out-perform Monte-carlo sampling, with the superiority of payoff-reduction minimaxing being especially marked.