• Corpus ID: 19120520

Memory-Augmented Monte Carlo Tree Search

@inproceedings{Xiao2018MemoryAugmentedMC,
  title={Memory-Augmented Monte Carlo Tree Search},
  author={Chenjun Xiao and Jincheng Mei and Martin M{\"u}ller},
  booktitle={AAAI},
  year={2018}
}
This paper proposes and evaluates Memory-Augmented Monte Carlo Tree Search (M-MCTS), which provides a new approach to exploit generalization in online realtime search. [] Key Method This memory is used to generate an approximate value estimation by combining the estimations of similar states. We show that the memory based value approximation is better than the vanilla Monte Carlo estimation with high probability under mild conditions. We evaluate M-MCTS in the game of Go. Experimental results show that MMCTS…

Figures from this paper

Maximum Entropy Monte-Carlo Planning
TLDR
It is proved that the probability of MENTS failing to identify the best decision at the root decays exponentially, which fundamentally improves the polynomial convergence rate of UCT.
Monte Carlo Tree Search With Iteratively Refining State Abstractions
TLDR
This work presents a method, called abstraction refining, for extending MCTS to stochastic environments which, unlike progressive widening, leverages the geometry of the state space, and argues that leveraging the geometry in the space can offer advantages.
ME-MCTS: Online Generalization by Combining Multiple Value Estimators
TLDR
The proposed Multiple Estimator Monte Carlo Tree Search (ME-MCTS) introduces a formalization of online generalization that can represent existing techniques such as “history heuristics’, “RAVE”, or “OMA” – contextual action value estimators or abstractors that generalize across specific contexts.
Enhancing the Monte Carlo Tree Search Algorithm for Video Game Testing
TLDR
This study extends the MCTS agent with several modifications for game testing purposes and presents a novel tree reuse strategy that improves the bug finding performance of the agents.
UCT-ADP Progressive Bias Algorithm for Solving Gomoku
  • Xu Cao, Yanghao Lin
  • Computer Science
    2019 IEEE Symposium Series on Computational Intelligence (SSCI)
  • 2019
TLDR
This framework uses UCT to balance the exploration and exploitation of Gomoku game trees while it also applies powerful pruning strategies and heuristic function to re-select the available 2-adjacent grids of the state and use ADP instead of simulation to give estimated values of expanded nodes.
Parallel Tracking and Reconstruction of States in Heuristic Optimization Systems on GPUs
TLDR
This paper proposes a new general high-level approach to track and reconstruct states in the scope of heuristic optimization systems on GPUs that has a considerably lower memory consumption compared to traditional approaches and scales well with the complexity of the optimization problem.
Learning compositional programs with arguments and sampling
TLDR
A state of the art model, AlphaNPI, is extended upon by learning to generate functions that can accept arguments, and an Approximate version of Monte Carlo Tree Search (A-MCTS) is investigated to speed up convergence.
Fast deep reinforcement learning using online adjustments from the past
TLDR
EVA shifts the value predicted by a neural network with an estimate of the value function found by prioritised sweeping over experience tuples from the replay buffer near the current state to allow deep reinforcement learning agents to rapidly adapt to experience in their replay buffer.
An Evaluation of Monte-Carlo Tree Search for Property Falsification on Hybrid Flight Control Laws
TLDR
An evaluation of a simple Monte-Carlo Tree Search property falsification algorithm, applied to select properties of a longitudinal hybrid flight control law: a threshold overshoot property, two frequential properties, and a discrete event-based property.
Deep Reinforcement Learning
  • Yuxi Li
  • Computer Science
    Reinforcement Learning for Cyber-Physical Systems
  • 2019
TLDR
This work discusses deep reinforcement learning in an overview style, focusing on contemporary work, and in historical contexts, with background of artificial intelligence, machine learning, deep learning, and reinforcement learning (RL), with resources.
...
1
2
...

References

SHOWING 1-10 OF 26 REFERENCES
Monte-Carlo tree search and rapid action value estimation in computer Go
Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search
TLDR
A new framework to combine tree search with Monte-Carlo evaluation, that does not separate between a min-max phase and a Monte- carlo phase is presented, that provides finegrained control of the tree growth, at the level of individual simulations, and allows efficient selectivity.
Bandit Based Monte-Carlo Planning
TLDR
A new algorithm is introduced, UCT, that applies bandit ideas to guide Monte-Carlo planning and is shown to be consistent and finite sample bounds are derived on the estimation error due to sampling.
Temporal-difference search in computer Go
TLDR
This work applies temporal-difference search to the game of 9×9 Go, using a million binary features matching simple patterns of stones, and outperformed an unenhanced Monte-Carlo tree search with the same number of simulations.
Transpositions and move groups in Monte Carlo tree search
TLDR
From the experimental results, it is concluded that both exploiting the graph structure and grouping moves may contribute to an increase in the playing strength of game programs using UCT.
Improving Exploration in UCT Using Local Manifolds
TLDR
This paper improves exploration in UCT by generalizing across similar states using a given distance metric, and shows how to learn a local manifold from the transition graph of states in the near future.
Mastering the game of Go with deep neural networks and tree search
TLDR
Using this search algorithm, the program AlphaGo achieved a 99.8% winning rate against other Go programs, and defeated the human European Go champion by 5 games to 0.5, the first time that a computer program has defeated a human professional player in the full-sized game of Go.
Reinforcement Learning with Deep Energy-Based Policies
TLDR
A method for learning expressive energy-based policies for continuous states and actions, which has been feasible only in tabular domains before, is proposed and a new algorithm, called soft Q-learning, that expresses the optimal policy via a Boltzmann distribution is applied.
A General Solution to the Graph History Interaction Problem
TLDR
This paper presents a practical solution to the GHI problem that combines and extends previous techniques and is applicable to different game tree search algorithms and to different domains.
Combining online and offline knowledge in UCT
TLDR
This work considers three approaches for combining offline and online value functions in the UCT algorithm, and combines these algorithms in MoGo, the world's strongest 9 x 9 Go program, where each technique significantly improves MoGo's playing strength.
...
1
2
3
...