On Monte Carlo Tree Search and Reinforcement Learning

@article{Vodopivec2017OnMC,
  title={On Monte Carlo Tree Search and Reinforcement Learning},
  author={Tom Vodopivec and Spyridon Samothrakis and Branko Ster},
  journal={J. Artif. Intell. Res.},
  year={2017},
  volume={60},
  pages={881-936}
}
Fuelled by successes in Computer Go, Monte Carlo tree search (MCTS) has achieved widespread adoption within the games community. Its links to traditional reinforcement learning (RL) methods have been outlined in the past; however, the use of RL techniques within tree search has not been thoroughly studied yet. In this paper we re-examine in depth this close relation between the two fields; our goal is to improve the cross-awareness between the two communities. We show that a straightforward… 
Using Monte Carlo Tree Search as a Demonstrator within Asynchronous Deep RL
TLDR
This paper augments Asynchronous Advantage Actor-Critic method with a novel self-supervised auxiliary task, i.e. A3C-TP, and proposes a new framework where planning algorithms such as Monte Carlo tree search or other sources of (simulated) demonstrators can be integrated to asynchronous distributed DRL methods.
Learning Policies from Self-Play with Policy Gradients and MCTS Value Estimates
TLDR
It is argued that such an extent of exploration is undesirable, and a novel objective function for training policies that are not exploratory is proposed, which can be estimated using MCTS value estimates, rather than M CTS visit counts.
Action Guidance with MCTS for Deep Reinforcement Learning
TLDR
This paper proposes a new framework where even a non-expert simulated demonstrator, e.g., planning algorithms such as Monte Carlo tree search with a small number rollouts, can be integrated within asynchronous distributed deep reinforcement learning methods.
Safer Deep RL with Shallow MCTS: A Case Study in Pommerman
TLDR
This paper exemplifies and analyzes the high rate of catastrophic events that happen under random exploration in a domain with sparse, delayed, and deceptive rewards: the recently-proposed multi-agent benchmark of Pommerman, and proposes a new framework where even a non-expert simulated demonstrator can be integrated to asynchronous distributed deep reinforcement learning methods.
A novel real-time design for fighting game AI
TLDR
The experimental results show that this algorithmic approach towards real-time fighting games via the fighting game AI challenge would provide an excellent AI outperforming pure Monte-Carlo tree search and classic algorithms such as evolutionary algorithms or deep reinforcement learning.
A survey and critique of multiagent deep reinforcement learning
TLDR
A clear overview of current multiagent deep reinforcement learning (MDRL) literature is provided to help unify and motivate future research to take advantage of the abundant literature that exists in a joint effort to promote fruitful research in the multiagent community.
Quarto as a Reinforcement Learning problem
Reinforcement Learning has proven itself through a recent history of superhuman level performances in various tasks. Board games are of particular interest because they can be simulated completely,
A Survey of Planning and Learning in Games
TLDR
This paper presents a survey of the multiple methodologies proposed to integrate planning and learning in the context of games, both in terms of their theoretical foundations and applications and also presents learning and planning techniques commonly used in games.
Backplay: "Man muss immer umkehren"
TLDR
The approach, Backplay, uses a single demonstration to construct a curriculum for a given task, and analytically characterize the types of environments where Backplay can improve training speed and compare favorably to other competitive methods known to improve sample efficiency.
Chapter 4-Frontiers of GVGAI Planning
Multiple studies have tackled the problem presented in the planning track of GVGAI. This chapter aims to present the state-of-the-art on GVGAI, describing the most successful approaches on this
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 116 REFERENCES
Learning non-random moves for playing Othello: Improving Monte Carlo Tree Search
TLDR
Temporal Difference Learning (TDL) is employed as a general approach to the integration of domain-specific knowledge in MCTS and used to learn a linear function approximator that is used as an a priori bias to the move selection in the algorithm's default policy.
On the Analysis of Complex Backup Strategies in Monte Carlo Tree Search
TLDR
It is demonstrated that in some probabilistic planning benchmarks from the International Planning Competition (IPC), selecting a MCTS variant with a backup strategy different from Monte Carlo averaging can lead to substantially better results.
Monte Carlo tree search with temporal-difference learning for general video game playing
TLDR
Experiments show that the proposed modifications improve the performance of MCTS significantly in GVGP, and applications of reinforcement learning techniques in this domain is a promising subject that needs to be further researched.
Temporal-difference search in computer Go
TLDR
This work applies temporal-difference search to the game of 9×9 Go, using a million binary features matching simple patterns of stones, and outperformed an unenhanced Monte-Carlo tree search with the same number of simulations.
Backpropagation Modification in Monte-Carlo Game Tree Search
  • Fan Xie, Zhiqing Liu
  • Computer Science
    2009 Third International Symposium on Intelligent Information Technology Application
  • 2009
TLDR
A new method to improve the performance of UCT algorithm by increasing the feedback value of the later simulations is presented and the experimental results in the classical game Go show that the approach increases theperformance of Monte-Carlo simulations significantly when exponential models are used.
Mastering the game of Go with deep neural networks and tree search
TLDR
Using this search algorithm, the program AlphaGo achieved a 99.8% winning rate against other Go programs, and defeated the human European Go champion by 5 games to 0.5, the first time that a computer program has defeated a human professional player in the full-sized game of Go.
Monte Carlo Tree Search with heuristic evaluations using implicit minimax backups
TLDR
This paper proposes a new way to use heuristic evaluations to guide the MCTS search by storing the two sources of information, estimated win rates and heuristic evaluated, separately, separately and shows that using implicit minimax backups leads to stronger play performance in Kalah, Breakthrough, and Lines of Action.
Thinking Fast and Slow with Deep Learning and Tree Search
TLDR
This paper presents Expert Iteration (ExIt), a novel reinforcement learning algorithm which decomposes the problem into separate planning and generalisation tasks, and shows that ExIt outperforms REINFORCE for training a neural network to play the board game Hex, and the final tree search agent, trained tabula rasa, defeats MoHex 1.0.
A Survey of Monte Carlo Tree Search Methods
TLDR
A survey of the literature to date of Monte Carlo tree search, intended to provide a snapshot of the state of the art after the first five years of MCTS research, outlines the core algorithm's derivation, impart some structure on the many variations and enhancements that have been proposed, and summarizes the results from the key game and nongame domains.
Scalable and Efficient Bayes-Adaptive Reinforcement Learning Based on Monte-Carlo Tree Search
TLDR
This paper introduces a tractable, sample-based method for approximate Bayes-optimal planning which exploits Monte-Carlo tree search and enables it to outperform previous Bayesian model-based reinforcement learning algorithms by a significant margin on several well-known benchmark problems.
...
1
2
3
4
5
...