Nested Rollout Policy Adaptation for Monte Carlo Tree Search

@inproceedings{Rosin2011NestedRP,
  title={Nested Rollout Policy Adaptation for Monte Carlo Tree Search},
  author={Christopher D. Rosin},
  booktitle={IJCAI},
  year={2011}
}
  • C. Rosin
  • Published in IJCAI 16 July 2011
  • Computer Science
Monte Carlo tree search (MCTS) methods have had recent success in games, planning, and optimization. MCTS uses results from rollouts to guide search; a rollout is a path that descends the tree with a randomized decision at each ply until reaching a leaf. MCTS results can be strongly influenced by the choice of appropriate policy to bias the rollouts. Most previous work on MCTS uses static uniform random or domain-specific policies. We describe a new MCTS method that dynamically adapts the… 

Figures and Tables from this paper

Nested Rollout Policy Adaptation with Selective Policies
TLDR
This work proposes to enhance NRPA using more selectivity in the playouts to improve on standard NRPA for all three problems: Bus regulation, SameGame and Weak Schur numbers.
Nested Monte-Carlo Tree Search for Online Planning in Large MDPs
TLDR
This work proposes Nested Monte-Carlo Tree Search (NMCTS), in which MCTS itself is recursively used to provide a rollout policy for higher-level searches, and shows that NMCTS is significantly more effective than regular MCTs at equal time controls, both using random and heuristic rollouts at the base level.
Parallel Nested Rollout Policy Adaptation
  • A. Nagórko
  • Computer Science
    2019 IEEE Conference on Games (CoG)
  • 2019
TLDR
A parallel version of NRPA is developed that replicates results of the sequential version and allows for deeper calculations, showing that depth of the calculation is a deciding factor in the result quality.
High-Diversity Monte-Carlo Tree Search Tristan Cazenave and
TLDR
High-Diversity NRPA is proposed, which keeps a bounded number of solutions in each recursion level and includes several improvements that further reduce the running time of the algorithm and improve its diversity.
Improved Diversity in Nested Rollout Policy Adaptation
TLDR
This paper proposes refinements for Beam-NRPA that improve the runtime and the solution diversity and proposes a new approach to nested Monte-Carlo search called nested rollout with policy adaptation NRPA.
Single-Agent Optimization Through Policy Iteration Using Monte-Carlo Tree Search
TLDR
A search algorithm that uses a variant of MCTS enhanced by a novel action value normalization mechanism for games with potentially unbounded rewards, defining a virtual loss function that enables effective search parallelization, and a policy network, trained by generations of self-play, to guide the search.
Playout policy adaptation with move features
Single Player Monte-Carlo Tree Search Based on the Plackett-Luce Model
TLDR
Plackett-Luce MCTS (PL-MCTS), a path search algorithm based on a probabilistic model over the qualities of successor nodes, is presented, and it is empirically show that PL-M CTS is competitive and often superior to the state of the art.
Monte-Carlo Planning: Theoretically Fast Convergence Meets Practical Efficiency
TLDR
A stand is taken on the individual strengths of these two classes of algorithms, and how they can be effectively connected, and a principle of "selective tree expansion" is rationalized and a concrete implementation of this principle within MCTS is suggested.
Monte-Carlo Fork Search for Cooperative Path-Finding
TLDR
Nested MCFS (NMCFS) solves congestion problems in the literature finding better solutions than the state-of-the-art solutions, and it solves N-puzzles without hole near-optimally.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 43 REFERENCES
On-line Policy Improvement using Monte-Carlo Search
TLDR
A Monte-Carlo simulation algorithm for real-time policy improvement of an adaptive controller and results are reported for a wide variety of initial policies, ranging from a random policy to TD-Gammon, an extremely strong multi-layer neural network.
Monte-Carlo Exploration for Deterministic Planning
TLDR
Monte-Carlo random walks are used to explore the local neighborhood of a search state for action selection in the forward chaining planner ARVAND, and yield a larger and unbiased sample of the search neighborhood, and require state evaluations only at the endpoints of each walk.
Nested Monte-Carlo Search
TLDR
Nested Monte-Carlo Search addresses the problem of guiding the search toward better states when there is no available heuristic, and uses nested levels of random games to guide the search.
Monte-Carlo simulation balancing
TLDR
The main idea is to optimise the balance of a simulation policy, so that an accurate spread of simulation outcomes is maintained, rather than optimising the direct strength of the simulation policy.
Nested Monte-Carlo Search with AMAF Heuristic
TLDR
In the present study, All-Move-As-First heuristic is incorporated in Nested Monte-Carlo Search and the number of search is reduced to maintain a pseudo number of searches in order to achieve the higher level search.
Searching Solitaire in Real Time
TLDR
A multistage nested rollout algorithm that allows the user to apply separate heuristics at each stage of the search process and tune the search magnitude for each stage and proposes a searchtree compression that reveals a new state representation for Klondike Solitaire and Thoughtful Solitaire.
Monte-Carlo Planning in Large POMDPs
TLDR
POMCP is the first general purpose planner to achieve high performance in such large and unfactored POMDPs as 10 x 10 battleship and partially observable PacMan, with approximately 1018 and 1056 states respectively.
Approximate Policy Iteration with a Policy Language Bias
TLDR
This work induces high-quality domain-specific planners for classical planning domains by solving such domains as extremely large MDPs by replacing the usual cost-function learning step with a learning step in policy space.
Combining online and offline knowledge in UCT
TLDR
This work considers three approaches for combining offline and online value functions in the UCT algorithm, and combines these algorithms in MoGo, the world's strongest 9 x 9 Go program, where each technique significantly improves MoGo's playing strength.
UCD: Upper Confidence Bound for Rooted Directed Acyclic Graphs
TLDR
This paper presents a framework for testing various algorithms that deal with transpositions in Monte-Carlo Tree Search (MCTS), and proposes parameterized ways to compute the mean of the child, the playouts of the parent and the playout of the children.
...
1
2
3
4
5
...