• Corpus ID: 1465907

Self-play Monte-Carlo tree search in computer poker

  title={Self-play Monte-Carlo tree search in computer poker},
  author={Johannes Heinrich and David Silver},
  booktitle={AAAI 2014},
© Copyright 2014. [] Key Result We introduce a variant of the established UCB algorithm and provide first empinc al results demonstrating its ability to find approximate Nash equilibria.

Figures and Tables from this paper

Smooth UCT Search in Computer Poker

Smooth UCT is introduced, a variant of the established Upper Confidence Bounds Applied to Trees algorithm that outperformed UCT in Limit Texas Hold'em and won 3 silver medals in the 2014 Annual Computer Poker Competition.

Online Monte Carlo Counterfactual Regret Minimization for Search in Imperfect Information Games

It is shown that OOS can overcome the problem of non-locality encountered by previous search algorithms and perform well against its worst-case opponents and that preexisting Information Set Monte Carlo tree search (ISMCTS) can get more exploitable over time.

Pruning Playouts in Monte-Carlo Tree Search for the Game of Havannah

This paper proposes a new method to bias the playout policy of MCTS by prune the decisions which seem “bad” (according to the previous iterations of the algorithm) before computing each playout, and evaluates the estimated “good” moves more precisely.

Monte Carlo Tree Search for games with hidden information and uncertainty

The ISMCTS algorithm is shown to outperform the existing approach of Perfect Information Monte Carlo (PIMC) search and can be used to solve two known issues with PIMC search, namely strategy fusion and non-locality.

Deep Reinforcement Learning from Self-Play in Imperfect-Information Games

This paper introduces the first scalable end-to-end approach to learning approximate Nash equilibria without prior domain knowledge, and combines fictitious self-play with deep reinforcement learning.

Emergent bluffing and inference with Monte Carlo Tree Search

This paper augments Monte Carlo Tree Search with mechanisms for performing inference and bluffing, and shows that this model can be repurposed to perform an approximation of Bayesian inference.

On the Tactical and Strategic Behaviour of MTCS when Biasing Random Simulations

The results indicate that improved Monte-Carlo policies, such as PoolRave or Last-Good-Reply, work better for games with a strong tactical element for small numbers of random simulations, whereas more general policies seem to be more suited for gamesWith a strong strategic element for higher numbers ofrandom simulations.

Imperfect and Cooperative Guandan Game System

The experimental results show that the improved UCT algorithm is better than random strategy in terms of intelligence, and it has some meaning for solving the game problems involving cooperation relationship.

Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms

This chapter reviews the theoretical results of MARL algorithms mainly within two representative frameworks, Markov/stochastic games and extensive-form games, in accordance with the types of tasks they address, i.e., fully cooperative, fully competitive, and a mix of the two.

Enhanced Rolling Horizon Evolution Algorithm with Opponent Model Learning: Results for the Fighting Game AI Competition

A novel algorithm that combines Rolling Horizon Evolution Algorithm (RHEA) with opponent model learning and is optimized by supervised learning with cross-entropy and reinforcement learning with policy gradient and Q-learning respectively is proposed.



Information Set Monte Carlo Tree Search

Three new information set MCTS (ISMCTS) algorithms are presented which handle different sources of hidden information and uncertainty in games, instead of searching minimax trees of game states, the ISMCTS algorithms search trees of information sets, more directly analyzing the true structure of the game.

Computing Approximate Nash Equilibria and Robust Best-Responses Using Sampling

Monte-Carlo Restricted Nash Response (MCRNR), a sample-based algorithm for the computation of restricted Nash strategies that are robust bestresponse strategies that exploit non-NE opponents more than playing a NE and are not (overly) exploitable by other strategies.

Regret Minimization in Games with Incomplete Information

It is shown how minimizing counterfactual regret minimizes overall regret, and therefore in self-play can be used to compute a Nash equilibrium, and is demonstrated in the domain of poker, showing it can solve abstractions of limit Texas Hold'em with as many as 1012 states, two orders of magnitude larger than previous methods.

Monte Carlo Sampling for Regret Minimization in Extensive Games

A general family of domain-independent CFR sample-based algorithms called Monte Carlo counterfactual regret minimization (MCCFR) is described, of which the original and poker-specific versions are special cases.

Temporal-difference search in computer Go

This work applies temporal-difference search to the game of 9×9 Go, using a million binary features matching simple patterns of stones, and outperformed an unenhanced Monte-Carlo tree search with the same number of simulations.

Algorithms and assessment in computer poker

A major theme of this dissertation is the evolution of architectures for poker-playing programs that has occurred since the research began in 1992, and four distinct approaches are addressed: knowledge-based systems, simulation, game-theoretic methods, and adaptive imperfect information game-tree search.

Approximating Game-Theoretic Optimal Strategies for Full-scale Poker

The computation of the first complete approximations of game-theoretic optimal strategies for full-scale poker is addressed, and linear programming solutions to the abstracted game are used to create substantially improved poker-playing programs.

The State of Solving Large Incomplete-Information Games, and Application to Poker

In short, game-theoretic reasoning now scales to many large problems, outperforms the alternatives on those problems, and in some games beats the best humans.

Finite-time Analysis of the Multiarmed Bandit Problem

This work shows that the optimal logarithmic regret is also achievable uniformly over time, with simple and efficient policies, and for all reward distributions with bounded support.

Gambling in a rigged casino: The adversarial multi-armed bandit problem

A solution to the bandit problem in which an adversary, rather than a well-behaved stochastic process, has complete control over the payoffs is given.