# Action Guidance with MCTS for Deep Reinforcement Learning

@inproceedings{Kartal2019ActionGW,
title={Action Guidance with MCTS for Deep Reinforcement Learning},
author={Bilal Kartal and Pablo Hernandez-Leal and Matthew E. Taylor},
booktitle={Artificial Intelligence and Interactive Digital Entertainment Conference},
year={2019}
}
• Published in
Artificial Intelligence and…
25 July 2019
• Computer Science
Deep reinforcement learning has achieved great successes in recent years, however, one main challenge is the sample inefficiency. [] Key Method We propose a new framework where even a non-expert simulated demonstrator, e.g., planning algorithms such as Monte Carlo tree search with a small number rollouts, can be integrated within asynchronous distributed deep reinforcement learning methods. Compared to a vanilla deep RL algorithm, our proposed methods both learn faster and converge to better policies on a…
11 Citations

## Figures and Tables from this paper

• Computer Science
ArXiv
• 2022
This work explores on-line adaptivity in MCTS without requiring pre-training, and demonstrates the new approach on the game miniXCOM, a simplified version of XCOM, a popular commercial franchise consisting of several turn-based tactical games.
• Computer Science
AAAI
• 2020
This work proposes Requesting Confidence-Moderated Policy advice (RCMP), an action-advising framework where the agent asks for advice when its epistemic uncertainty is high for a certain state, and describes a technique to estimate the agent uncertainty by performing minor modifications in standard value-function-based RL methods.
• Computer Science
ICLR
• 2020
By combining real experience with information computed during search, SAVE demonstrates that it is possible to improve on both the performance of model-free learning and the computational cost of planning.
• Computer Science
• 2019
By combining real experience with information computed during search, SAVE demonstrates that it is possible to improve on both the performance of model-free learning and the computational cost of planning.
• Computer Science
• 2020
Stable Emergent Policy (STEP) approximation is proposed, a distributed policy iteration scheme to stably approximate decentralized policies for partially observable and cooperative multi-agent systems and its performance is compared with state-of-the-art multi- agent reinforcement learning algorithms.
• Computer Science
Artificial Intelligence Review
• 2022
In more complex games (e.g. those with a high branching factor or real-time ones), an efficient MCTS application often requires its problem-dependent modification or integration with other techniques and domain-specific modifications and hybrid approaches are the main focus of this survey.
• Computer Science
IEEE Transactions on Information Forensics and Security
• 2021
This paper combines Monte Carlo Tree Search (MCTS) and steganalyzer-based environmental model and proposes an automatic non-additive steganographic distortion learning framework called MCTSteg, which has become the first reported universal non- additive Steganographic framework which can work in both spatial and JPEG domains.
• Computer Science
AIIDE Workshops
• 2020
This paper proposes an approach that aims at leveraging efficient planning to achieve similar results to emergent narrative, using Monte Carlo Tree Search and efficient data structures, and shows that competitive, collaborative and sustainable behaviors emerge in this system, without the explicit definition of such behaviors.
• Computer Science
• 2021
A “seed mutation tree” is designed and proposed by investigating and leveraging the mutation relationships among seeds and further model the seed scheduling problem as a Monte-Carlo Tree Search (MCTS) problem.

## References

SHOWING 1-10 OF 42 REFERENCES

The understanding of the problem is shared, possible ways to alleviate the sample cost of reinforcement learning are discussed, from the aspects of exploration, optimization, environment modeling, experience transfer, and abstraction.
• Computer Science
IEEE Signal Processing Magazine
• 2017
This survey will cover central algorithms in deep RL, including the deep Q-network (DQN), trust region policy optimization (TRPO), and asynchronous advantage actor critic, and highlight the unique advantages of deep neural networks, focusing on visual understanding via RL.
• Computer Science
Autonomous Agents and Multi-Agent Systems
• 2019
A clear overview of current multiagent deep reinforcement learning (MDRL) literature is provided to help unify and motivate future research to take advantage of the abundant literature that exists in a joint effort to promote fruitful research in the multiagent community.
• Computer Science
ICLR
• 2017
This paper significantly outperforms the previous state-of-the-art on Atari, averaging 880\% expert human performance, and a challenging suite of first-person, three-dimensional \emph{Labyrinth} tasks leading to a mean speedup in learning of 10$\times$ and averaging 87\% Expert human performance on Labyrinth.
• Computer Science
AAAI
• 2018
This paper presents an algorithm, Deep Q-learning from Demonstrations (DQfD), that leverages small sets of demonstration data to massively accelerate the learning process even from relatively small amounts of demonstrating data and is able to automatically assess the necessary ratio of demonstrationData while learning thanks to a prioritized replay mechanism.
• Computer Science
NIPS
• 2014
The central idea is to use the slow planning-based agents to provide training data for a deep-learning architecture capable of real-time play, and proposed new agents based on this idea are proposed and shown to outperform DQN.
• Computer Science
ICML
• 2016
A conceptually simple and lightweight framework for deep reinforcement learning that uses asynchronous gradient descent for optimization of deep neural network controllers and shows that asynchronous actor-critic succeeds on a wide variety of continuous motor control problems as well as on a new task of navigating random 3D mazes using a visual input.
• Computer Science
ArXiv
• 2018
This article provides a clear overview of current multiagent deep reinforcement learning (MDRL) literature and provides guidelines to complement this emerging area by showcasing examples on how methods and algorithms from DRL and multiagent learning (MAL) have helped solve problems in MDRL and providing general lessons learned from these works.
• Computer Science
QEST
• 2019
WiseMove is presented, a software framework to investigate safe deep reinforcement learning in the context of motion planning for autonomous driving that adopts a modular learning architecture that suits the current research questions and can be adapted to new technologies and new questions.
• Computer Science
AISTATS
• 2011
This paper proposes a new iterative algorithm, which trains a stationary deterministic policy, that can be seen as a no regret algorithm in an online learning setting and demonstrates that this new approach outperforms previous approaches on two challenging imitation learning problems and a benchmark sequence labeling problem.