Action Guidance with MCTS for Deep Reinforcement Learning

  title={Action Guidance with MCTS for Deep Reinforcement Learning},
  author={Bilal Kartal and Pablo Hernandez-Leal and Matthew E. Taylor},
  booktitle={Artificial Intelligence and Interactive Digital Entertainment Conference},
Deep reinforcement learning has achieved great successes in recent years, however, one main challenge is the sample inefficiency. [] Key Method We propose a new framework where even a non-expert simulated demonstrator, e.g., planning algorithms such as Monte Carlo tree search with a small number rollouts, can be integrated within asynchronous distributed deep reinforcement learning methods. Compared to a vanilla deep RL algorithm, our proposed methods both learn faster and converge to better policies on a…

Figures and Tables from this paper

Exploring Adaptive MCTS with TD Learning in miniXCOM

This work explores on-line adaptivity in MCTS without requiring pre-training, and demonstrates the new approach on the game miniXCOM, a simplified version of XCOM, a popular commercial franchise consisting of several turn-based tactical games.

Uncertainty-Aware Action Advising for Deep Reinforcement Learning Agents

This work proposes Requesting Confidence-Moderated Policy advice (RCMP), an action-advising framework where the agent asks for advice when its epistemic uncertainty is high for a certain state, and describes a technique to estimate the agent uncertainty by performing minor modifications in standard value-function-based RL methods.

Combining Q-Learning and Search with Amortized Value Estimates

By combining real experience with information computed during search, SAVE demonstrates that it is possible to improve on both the performance of model-free learning and the computational cost of planning.


By combining real experience with information computed during search, SAVE demonstrates that it is possible to improve on both the performance of model-free learning and the computational cost of planning.

A Distributed Policy Iteration Scheme for Cooperative Multi-Agent Policy Approximation

Stable Emergent Policy (STEP) approximation is proposed, a distributed policy iteration scheme to stably approximate decentralized policies for partially observable and cooperative multi-agent systems and its performance is compared with state-of-the-art multi- agent reinforcement learning algorithms.

Monte Carlo Tree Search: A Review of Recent Modifications and Applications

In more complex games (e.g. those with a high branching factor or real-time ones), an efficient MCTS application often requires its problem-dependent modification or integration with other techniques and domain-specific modifications and hybrid approaches are the main focus of this survey.

MCTSteg: A Monte Carlo Tree Search-Based Reinforcement Learning Framework for Universal Non-Additive Steganography

This paper combines Monte Carlo Tree Search (MCTS) and steganalyzer-based environmental model and proposes an automatic non-additive steganographic distortion learning framework called MCTSteg, which has become the first reported universal non- additive Steganographic framework which can work in both spatial and JPEG domains.

Leveraging Efficient Planning and Lightweight Agent Definition: A Novel Path Towards Emergent Narrative

This paper proposes an approach that aims at leveraging efficient planning to achieve similar results to emergent narrative, using Monte Carlo Tree Search and efficient data structures, and shows that competitive, collaborative and sustainable behaviors emerge in this system, without the explicit definition of such behaviors.

Flexible Charging Optimization for Electric Vehicles using MDPs-based Online Algorithms

Evolutionary Mutation-based Fuzzing as Monte Carlo Tree Search

A “seed mutation tree” is designed and proposed by investigating and leveraging the mutation relationships among seeds and further model the seed scheduling problem as a Monte-Carlo Tree Search (MCTS) problem.



Towards Sample Efficient Reinforcement Learning

The understanding of the problem is shared, possible ways to alleviate the sample cost of reinforcement learning are discussed, from the aspects of exploration, optimization, environment modeling, experience transfer, and abstraction.

Deep Reinforcement Learning: A Brief Survey

This survey will cover central algorithms in deep RL, including the deep Q-network (DQN), trust region policy optimization (TRPO), and asynchronous advantage actor critic, and highlight the unique advantages of deep neural networks, focusing on visual understanding via RL.

A survey and critique of multiagent deep reinforcement learning

A clear overview of current multiagent deep reinforcement learning (MDRL) literature is provided to help unify and motivate future research to take advantage of the abundant literature that exists in a joint effort to promote fruitful research in the multiagent community.

Reinforcement Learning with Unsupervised Auxiliary Tasks

This paper significantly outperforms the previous state-of-the-art on Atari, averaging 880\% expert human performance, and a challenging suite of first-person, three-dimensional \emph{Labyrinth} tasks leading to a mean speedup in learning of 10$\times$ and averaging 87\% Expert human performance on Labyrinth.

Deep Q-learning From Demonstrations

This paper presents an algorithm, Deep Q-learning from Demonstrations (DQfD), that leverages small sets of demonstration data to massively accelerate the learning process even from relatively small amounts of demonstrating data and is able to automatically assess the necessary ratio of demonstrationData while learning thanks to a prioritized replay mechanism.

Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Search Planning

The central idea is to use the slow planning-based agents to provide training data for a deep-learning architecture capable of real-time play, and proposed new agents based on this idea are proposed and shown to outperform DQN.

Asynchronous Methods for Deep Reinforcement Learning

A conceptually simple and lightweight framework for deep reinforcement learning that uses asynchronous gradient descent for optimization of deep neural network controllers and shows that asynchronous actor-critic succeeds on a wide variety of continuous motor control problems as well as on a new task of navigating random 3D mazes using a visual input.

Is multiagent deep reinforcement learning the answer or the question? A brief survey

This article provides a clear overview of current multiagent deep reinforcement learning (MDRL) literature and provides guidelines to complement this emerging area by showcasing examples on how methods and algorithms from DRL and multiagent learning (MAL) have helped solve problems in MDRL and providing general lessons learned from these works.

WiseMove: A Framework for Safe Deep Reinforcement Learning for Autonomous Driving

WiseMove is presented, a software framework to investigate safe deep reinforcement learning in the context of motion planning for autonomous driving that adopts a modular learning architecture that suits the current research questions and can be adapted to new technologies and new questions.

A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning

This paper proposes a new iterative algorithm, which trains a stationary deterministic policy, that can be seen as a no regret algorithm in an online learning setting and demonstrates that this new approach outperforms previous approaches on two challenging imitation learning problems and a benchmark sequence labeling problem.