Nested Rollout Policy Adaptation for Monte Carlo Tree Search
@inproceedings{Rosin2011NestedRP, title={Nested Rollout Policy Adaptation for Monte Carlo Tree Search}, author={Christopher D. Rosin}, booktitle={IJCAI}, year={2011} }
Monte Carlo tree search (MCTS) methods have had recent success in games, planning, and optimization. MCTS uses results from rollouts to guide search; a rollout is a path that descends the tree with a randomized decision at each ply until reaching a leaf. MCTS results can be strongly influenced by the choice of appropriate policy to bias the rollouts. Most previous work on MCTS uses static uniform random or domain-specific policies. We describe a new MCTS method that dynamically adapts the…
Figures and Tables from this paper
107 Citations
Nested Rollout Policy Adaptation with Selective Policies
- Computer ScienceCGW@IJCAI
- 2016
This work proposes to enhance NRPA using more selectivity in the playouts to improve on standard NRPA for all three problems: Bus regulation, SameGame and Weak Schur numbers.
Nested Monte-Carlo Tree Search for Online Planning in Large MDPs
- Computer ScienceECAI
- 2012
This work proposes Nested Monte-Carlo Tree Search (NMCTS), in which MCTS itself is recursively used to provide a rollout policy for higher-level searches, and shows that NMCTS is significantly more effective than regular MCTs at equal time controls, both using random and heuristic rollouts at the base level.
Parallel Nested Rollout Policy Adaptation
- Computer Science2019 IEEE Conference on Games (CoG)
- 2019
A parallel version of NRPA is developed that replicates results of the sequential version and allows for deeper calculations, showing that depth of the calculation is a deciding factor in the result quality.
High-Diversity Monte-Carlo Tree Search Tristan Cazenave and
- Computer Science
- 2016
High-Diversity NRPA is proposed, which keeps a bounded number of solutions in each recursion level and includes several improvements that further reduce the running time of the algorithm and improve its diversity.
Improved Diversity in Nested Rollout Policy Adaptation
- Computer ScienceKI
- 2016
This paper proposes refinements for Beam-NRPA that improve the runtime and the solution diversity and proposes a new approach to nested Monte-Carlo search called nested rollout with policy adaptation NRPA.
Single-Agent Optimization Through Policy Iteration Using Monte-Carlo Tree Search
- Computer ScienceArXiv
- 2020
A search algorithm that uses a variant of MCTS enhanced by a novel action value normalization mechanism for games with potentially unbounded rewards, defining a virtual loss function that enables effective search parallelization, and a policy network, trained by generations of self-play, to guide the search.
Single Player Monte-Carlo Tree Search Based on the Plackett-Luce Model
- Computer ScienceAAAI
- 2021
Plackett-Luce MCTS (PL-MCTS), a path search algorithm based on a probabilistic model over the qualities of successor nodes, is presented, and it is empirically show that PL-M CTS is competitive and often superior to the state of the art.
Monte-Carlo Planning: Theoretically Fast Convergence Meets Practical Efficiency
- Computer ScienceUAI
- 2013
A stand is taken on the individual strengths of these two classes of algorithms, and how they can be effectively connected, and a principle of "selective tree expansion" is rationalized and a concrete implementation of this principle within MCTS is suggested.
Monte-Carlo Fork Search for Cooperative Path-Finding
- Computer ScienceCGW@IJCAI
- 2013
Nested MCFS (NMCFS) solves congestion problems in the literature finding better solutions than the state-of-the-art solutions, and it solves N-puzzles without hole near-optimally.
References
SHOWING 1-10 OF 43 REFERENCES
On-line Policy Improvement using Monte-Carlo Search
- Computer ScienceNIPS
- 1996
A Monte-Carlo simulation algorithm for real-time policy improvement of an adaptive controller and results are reported for a wide variety of initial policies, ranging from a random policy to TD-Gammon, an extremely strong multi-layer neural network.
Monte-Carlo Exploration for Deterministic Planning
- MathematicsIJCAI
- 2009
Monte-Carlo random walks are used to explore the local neighborhood of a search state for action selection in the forward chaining planner ARVAND, and yield a larger and unbiased sample of the search neighborhood, and require state evaluations only at the endpoints of each walk.
Nested Monte-Carlo Search
- Computer ScienceIJCAI
- 2009
Nested Monte-Carlo Search addresses the problem of guiding the search toward better states when there is no available heuristic, and uses nested levels of random games to guide the search.
Monte-Carlo simulation balancing
- Computer ScienceICML '09
- 2009
The main idea is to optimise the balance of a simulation policy, so that an accurate spread of simulation outcomes is maintained, rather than optimising the direct strength of the simulation policy.
Nested Monte-Carlo Search with AMAF Heuristic
- Computer Science2010 International Conference on Technologies and Applications of Artificial Intelligence
- 2010
In the present study, All-Move-As-First heuristic is incorporated in Nested Monte-Carlo Search and the number of search is reduced to maintain a pseudo number of searches in order to achieve the higher level search.
Searching Solitaire in Real Time
- Computer ScienceJ. Int. Comput. Games Assoc.
- 2007
A multistage nested rollout algorithm that allows the user to apply separate heuristics at each stage of the search process and tune the search magnitude for each stage and proposes a searchtree compression that reveals a new state representation for Klondike Solitaire and Thoughtful Solitaire.
Monte-Carlo Planning in Large POMDPs
- Computer ScienceNIPS
- 2010
POMCP is the first general purpose planner to achieve high performance in such large and unfactored POMDPs as 10 x 10 battleship and partially observable PacMan, with approximately 1018 and 1056 states respectively.
Approximate Policy Iteration with a Policy Language Bias
- Computer ScienceNIPS
- 2003
This work induces high-quality domain-specific planners for classical planning domains by solving such domains as extremely large MDPs by replacing the usual cost-function learning step with a learning step in policy space.
Combining online and offline knowledge in UCT
- Computer ScienceICML '07
- 2007
This work considers three approaches for combining offline and online value functions in the UCT algorithm, and combines these algorithms in MoGo, the world's strongest 9 x 9 Go program, where each technique significantly improves MoGo's playing strength.
UCD: Upper Confidence Bound for Rooted Directed Acyclic Graphs
- Computer Science
- 2010
This paper presents a framework for testing various algorithms that deal with transpositions in Monte-Carlo Tree Search (MCTS), and proposes parameterized ways to compute the mean of the child, the playouts of the parent and the playout of the children.