Bandit Based Monte-Carlo Planning

@inproceedings{Kocsis2006BanditBM,
  title={Bandit Based Monte-Carlo Planning},
  author={Levente Kocsis and Csaba Szepesv{\'a}ri},
  booktitle={ECML},
  year={2006}
}
For large state-space Markovian Decision Problems MonteCarlo planning is one of the few viable approaches to find near-optimal solutions. In this paper we introduce a new algorithm, UCT, that applies bandit ideas to guide Monte-Carlo planning. In finite-horizon or discounted MDPs the algorithm is shown to be consistent and finite sample bounds are derived on the estimation error due to sampling. Experimental results show that in several domains, UCT is significantly more efficient than its… CONTINUE READING
BETA

Citations

Publications citing this paper.
SHOWING 1-10 OF 1,281 CITATIONS, ESTIMATED 28% COVERAGE

FILTER CITATIONS BY YEAR

2006
2019

CITATION STATISTICS

  • 345 Highly Influenced Citations

  • Averaged 153 Citations per year over the last 3 years

  • 4% Increase in citations per year in 2018 over 2017

Similar Papers

Loading similar papers…