Cross-Entropy for Monte-Carlo Tree Search

@article{Chaslot2008CrossEntropyFM,
  title={Cross-Entropy for Monte-Carlo Tree Search},
  author={Guillaume Chaslot and Mark H. M. Winands and Istv{\'a}n Szita and H. Jaap van den Herik},
  journal={J. Int. Comput. Games Assoc.},
  year={2008},
  volume={31},
  pages={145-156}
}
Recently, Monte-Carlo Tree Search (MCTS) has become a popular approach for intelligent play in games. Amongst others, it is successfully used in most state-of-the-art Go programs. To improve the playing strength of these Go programs any further, many parameters dealing with MCTS should be fine-tuned. In this paper, we propose to apply the Cross-Entropy Method (CEM) for this task. The method is comparable to Estimation-of-Distribution Algorithms (EDAs), a new area of evolutionary computation. We… 

Figures and Tables from this paper

Single-player Monte-Carlo tree search for SameGame
Self-Adaptive Monte Carlo Tree Search in General Game Playing
TLDR
A self-adaptive MCTS strategy (SA-MCTS) that integrates within the search a method to automatically tune search-control parameters online per game and presents five different allocation strategies that decide how to allocate available samples to evaluate parameter values.
On-Line Parameter Tuning for Monte-Carlo Tree Search in General Game Playing
TLDR
This paper proposes a method to automatically tune search-control parameters on-line for GGP, and considers the tuning problem as a Combinatorial Multi-Armed Bandit (CMAB).
Monte Carlo Tree Search for the Game of Diplomacy
TLDR
This paper explores the application of Monte Carlo Tree Search in the “Diplomacy” multi-agent strategic board game, by putting forward and evaluating eight variants of MCTS Diplomacy agents and providing a thorough experimental evaluation of the approach.
A Survey of Monte Carlo Tree Search Methods
TLDR
A survey of the literature to date of Monte Carlo tree search, intended to provide a snapshot of the state of the art after the first five years of MCTS research, outlines the core algorithm's derivation, impart some structure on the many variations and enhancements that have been proposed, and summarizes the results from the key game and nongame domains.
MCTS/UCT in Solving Real-Life Problems
  • J. Mańdziuk
  • Computer Science
    Advances in Data Analysis with Computational Intelligence Methods
  • 2018
TLDR
This paper summarizes the studies in application of MCTS/UCT to domains other than games, with particular emphasis on hard real-life problems which possess a large degree of uncertainty due to existence of certain stochastic factors in their definition.
Variance Reduction in Monte-Carlo Tree Search
TLDR
This paper examines the application of some standard techniques for variance reduction in MCTS, including common random numbers, antithetic variates and control variates, and demonstrates their efficacy on three different stochastic, single-agent settings: Pig, Can't Stop and Dominion.
Improvements on Learning Tetris with Cross Entropy
TLDR
This work considers this method to tune an evaluation-based one-piece controller as suggested by Szita and Lȍrincz and introduces some improvements and shows that on the original game of Tetris, this approach leads to a controller that outperforms the previous known results.
Monte-Carlo Tree Search Applied to the Game of Havannah
Playing the game of Havannah is di cult for computers. Nevertheless, Monte-Carlo Tree Search (MCTS) has been successfully applied to play Havannah in the past. In this paper several enhancements to
...
...

References

SHOWING 1-10 OF 32 REFERENCES
Bandit Algorithms for Tree Search
TLDR
A Bandit Algorithm for Smooth Trees (BAST) is introduced which takes into account actual smoothness of the rewards for performing efficient "cuts" of sub-optimal branches with high confidence and is illustrated on a global optimization problem of a continuous function, given noisy values.
Progressive Strategies for Monte-Carlo Tree Search
TLDR
Two progressive strategies for MCTS are introduced, called progressive bias and progressive unpruning, which enable the use of relatively time-expensive heuristic knowledge without speed reduction.
The Cross-Entropy Method for Combinatorial and Continuous Optimization
TLDR
The mode of a unimodal importance sampling distribution, like the mode of beta distribution, is used as an estimate of the optimal solution for continuous optimization and Markov chains approach for combinatorial optimization.
Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search
TLDR
A new framework to combine tree search with Monte-Carlo evaluation, that does not separate between a min-max phase and a Monte- carlo phase is presented, that provides finegrained control of the tree growth, at the level of individual simulations, and allows efficient selectivity.
RSPSA: Enhanced Parameter Optimization in Games
TLDR
This article proposes several methods to speed up SPSA, in particular, the combination with RPROP, using common random numbers, antithetic variables, and averaging, and test the resulting algorithm for tuning various types of parameters in two domains, Poker and LOA.
Modification of UCT with Patterns in Monte-Carlo Go
TLDR
A Monte-Carlo Go program, MoGo, which is the first computer Go program using UCT, is developed, and the modification of UCT for Go application is explained and also the intelligent random simulation with patterns which has improved significantly the performance of MoGo.
Monte-Carlo Go Reinforcement Learning Experiments
TLDR
The result obtained by the automatic learning experiments is better than the manual method by a 3-point margin on average, which is satisfactory, and the current results are promising on 19times19 boards.
Computing "Elo Ratings" of Move Patterns in the Game of Go
  • Rémi Coulom
  • Computer Science
    J. Int. Comput. Games Assoc.
  • 2007
TLDR
A new Bayesian technique for supervised learning of move patterns from game records, based on a generalization of Elo ratings, which outperforms most previous pattern-learning algorithms, both in terms of mean log-evidence, and prediction rate.
Learning extension parameters in game-tree search
Experiments in Parameter Learning Using Temporal Differences
TLDR
Some experiments are described in which the chess program KnightCaplearnt the parameters of its evaluation function using a combination of Temporal Differencelearning and on-line play on FICS and ICC.
...
...