Cross-Entropy for Monte-Carlo Tree Search

  title={Cross-Entropy for Monte-Carlo Tree Search},
  author={Guillaume Chaslot and Mark H. M. Winands and Istv{\'a}n Szita and H. Jaap van den Herik},
  journal={J. Int. Comput. Games Assoc.},
Recently, Monte-Carlo Tree Search (MCTS) has become a popular approach for intelligent play in games. Amongst others, it is successfully used in most state-of-the-art Go programs. To improve the playing strength of these Go programs any further, many parameters dealing with MCTS should be fine-tuned. In this paper, we propose to apply the Cross-Entropy Method (CEM) for this task. The method is comparable to Estimation-of-Distribution Algorithms (EDAs), a new area of evolutionary computation. We… 

Figures and Tables from this paper

Single-player Monte-Carlo tree search for SameGame

Self-Adaptive Monte Carlo Tree Search in General Game Playing

A self-adaptive MCTS strategy (SA-MCTS) that integrates within the search a method to automatically tune search-control parameters online per game and presents five different allocation strategies that decide how to allocate available samples to evaluate parameter values.

On-Line Parameter Tuning for Monte-Carlo Tree Search in General Game Playing

This paper proposes a method to automatically tune search-control parameters on-line for GGP, and considers the tuning problem as a Combinatorial Multi-Armed Bandit (CMAB).

Monte Carlo Tree Search for the Game of Diplomacy

This paper explores the application of Monte Carlo Tree Search in the “Diplomacy” multi-agent strategic board game, by putting forward and evaluating eight variants of MCTS Diplomacy agents and providing a thorough experimental evaluation of the approach.

A Survey of Monte Carlo Tree Search Methods

A survey of the literature to date of Monte Carlo tree search, intended to provide a snapshot of the state of the art after the first five years of MCTS research, outlines the core algorithm's derivation, impart some structure on the many variations and enhancements that have been proposed, and summarizes the results from the key game and nongame domains.

MCTS/UCT in Solving Real-Life Problems

  • J. Mańdziuk
  • Computer Science
    Advances in Data Analysis with Computational Intelligence Methods
  • 2018
This paper summarizes the studies in application of MCTS/UCT to domains other than games, with particular emphasis on hard real-life problems which possess a large degree of uncertainty due to existence of certain stochastic factors in their definition.

Variance Reduction in Monte-Carlo Tree Search

This paper examines the application of some standard techniques for variance reduction in MCTS, including common random numbers, antithetic variates and control variates, and demonstrates their efficacy on three different stochastic, single-agent settings: Pig, Can't Stop and Dominion.

Improvements on Learning Tetris with Cross Entropy

This work considers this method to tune an evaluation-based one-piece controller as suggested by Szita and Lȍrincz and introduces some improvements and shows that on the original game of Tetris, this approach leads to a controller that outperforms the previous known results.

Monte-Carlo Tree Search Applied to the Game of Havannah

Playing the game of Havannah is di cult for computers. Nevertheless, Monte-Carlo Tree Search (MCTS) has been successfully applied to play Havannah in the past. In this paper several enhancements to



Bandit Algorithms for Tree Search

A Bandit Algorithm for Smooth Trees (BAST) is introduced which takes into account actual smoothness of the rewards for performing efficient "cuts" of sub-optimal branches with high confidence and is illustrated on a global optimization problem of a continuous function, given noisy values.

Progressive Strategies for Monte-Carlo Tree Search

Two progressive strategies for MCTS are introduced, called progressive bias and progressive unpruning, which enable the use of relatively time-expensive heuristic knowledge without speed reduction.

The Cross-Entropy Method for Combinatorial and Continuous Optimization

The mode of a unimodal importance sampling distribution, like the mode of beta distribution, is used as an estimate of the optimal solution for continuous optimization and Markov chains approach for combinatorial optimization.

Modification of UCT with Patterns in Monte-Carlo Go

A Monte-Carlo Go program, MoGo, which is the first computer Go program using UCT, is developed, and the modification of UCT for Go application is explained and also the intelligent random simulation with patterns which has improved significantly the performance of MoGo.

Monte-Carlo Go Reinforcement Learning Experiments

The result obtained by the automatic learning experiments is better than the manual method by a 3-point margin on average, which is satisfactory, and the current results are promising on 19times19 boards.

Computing "Elo Ratings" of Move Patterns in the Game of Go

  • Rémi Coulom
  • Computer Science
    J. Int. Comput. Games Assoc.
  • 2007
A new Bayesian technique for supervised learning of move patterns from game records, based on a generalization of Elo ratings, which outperforms most previous pattern-learning algorithms, both in terms of mean log-evidence, and prediction rate.

Learning extension parameters in game-tree search

Temporal Difference Learning for Heuristic Search and Game Playing

Temporal Difference Learning and TD-Gammon

  • G. Tesauro
  • Computer Science
    J. Int. Comput. Games Assoc.
  • 1995
The domain of complex board games such as Go, chess, checkers, Othello, and backgammon has been widely regarded as an ideal testing ground for exploring a variety of concepts and approaches in artificial intelligence and machine learning.

Bandit Based Monte-Carlo Planning

A new algorithm is introduced, UCT, that applies bandit ideas to guide Monte-Carlo planning and is shown to be consistent and finite sample bounds are derived on the estimation error due to sampling.