Improved Monte-Carlo Search

Abstract

Monte-Carlo search has been successful in many non-deterministic games, and recently in deterministic games with high branching factor. One of the drawbacks of the current approaches is that even if the iterative process would last for a very long time, the selected move does not necessarily converge to a game-theoretic optimal one. In this paper we introduce a new algorithm, UCT, which extends a bandit algorithm for Monte-Carlo search. It is proven that the probability that the algorithm selects the correct move converges to 1. Moreover it is shown empirically that the algorithm converges rather fast even in comparison with alpha-beta search. Experiments in Amazons and Clobber indicate that the UCT algorithm outperforms considerably a plain Monte-Carlo version, and it is competitive against alpha-beta based game programs.

7 Figures and Tables

Cite this paper

@inproceedings{Kocsis2006ImprovedMS, title={Improved Monte-Carlo Search}, author={Levente Kocsis and Csaba Szepesv{\'a}ri and Jan Willemson}, year={2006} }