Bandit Based Monte-Carlo Planning


For large state-space Markovian Decision Problems MonteCarlo planning is one of the few viable approaches to find near-optimal solutions. In this paper we introduce a new algorithm, UCT, that applies bandit ideas to guide Monte-Carlo planning. In finite-horizon or discounted MDPs the algorithm is shown to be consistent and finite sample bounds are derived… (More)
DOI: 10.1007/11871842_29


4 Figures and Tables


Citations per Year

1,537 Citations

Semantic Scholar estimates that this publication has 1,537 citations based on the available data.

See our FAQ for additional information.