Generalising Discrete Action Spaces with Conditional Action Trees

@article{Bamford2021GeneralisingDA,
  title={Generalising Discrete Action Spaces with Conditional Action Trees},
  author={Christopher Bamford and Alvaro Ovalle},
  journal={2021 IEEE Conference on Games (CoG)},
  year={2021},
  pages={1-8}
}
There are relatively few conventions followed in reinforcement learning (RL) environments to structure the action spaces. As a consequence the application of RL algorithms to tasks with large action spaces with multiple components require additional effort to adjust to different formats. In this paper we introduce Conditional Action Trees with two main objectives: (1) as a method of structuring action spaces in RL to generalise across several action space specifications, and (2) to formalise a… 

Figures and Tables from this paper

Action Space Reduction for Planning Domains
TLDR
An automated way of reducing the action spaces of Reinforcement Learning environments, by leveraging lifted mutex groups is proposed, showing a significant reduction in the action space size of the RL environments.
GriddlyJS: A Web IDE for Reinforcement Learning
TLDR
GriddlyJS, a web-based Integrated Development Environment (IDE) based on the Griddly engine, is introduced, which allows researchers to visually design and debug arbitrary, complex PCG grid-world environments using a convenient graphical interface, as well as visualize, evaluate, and record the performance of trained agent models.

References

SHOWING 1-10 OF 25 REFERENCES
Action Space Shaping in Deep Reinforcement Learning
TLDR
The results show how domain-specific removal of actions and discretization of continuous actions can be crucial for successful learning.
Comparing Observation and Action Representations for Deep Reinforcement Learning in MicroRTS
TLDR
A preliminary study comparing different observation and action space representations for Deep Reinforcement Learning (DRL) in the context of Real-time Strategy (RTS) games shows that the local representation seems to outperform the global representation when training agents with the task of harvesting resources.
Hybrid Actor-Critic Reinforcement Learning in Parameterized Action Space
In this paper we propose a hybrid architecture of actor-critic algorithms for reinforcement learning in parameterized action space, which consists of multiple parallel sub-actor networks to decompose
Graph Constrained Reinforcement Learning for Natural Language Action Spaces
TLDR
KG-A2C, an agent that builds a dynamic knowledge graph while exploring and generates actions using a template-based action space is presented, arguing that the dual uses of the knowledge graph to reason about game state and to constrain natural language generation are the keys to scalable exploration of combinatorially large natural language actions.
A Closer Look at Invalid Action Masking in Policy Gradient Algorithms
TLDR
This paper shows that the standard working mechanism of invalid action masking corresponds to valid policy gradient updates and works by applying a state-dependent differentiable function during the calculation of action probability distribution.
Dota 2 with Large Scale Deep Reinforcement Learning
TLDR
By defeating the Dota 2 world champion (Team OG), OpenAI Five demonstrates that self-play reinforcement learning can achieve superhuman performance on a difficult task.
StarCraft II: A New Challenge for Reinforcement Learning
TLDR
This paper introduces SC2LE (StarCraft II Learning Environment), a reinforcement learning environment based on the StarCraft II game that offers a new and challenging environment for exploring deep reinforcement learning algorithms and architectures and gives initial baseline results for neural networks trained from this data to predict game outcomes and player actions.
Evolutionary MCTS for Multi-Action Adversarial Games
TLDR
Evolutionary Monte Carlo Tree Search (EMCTS) is introduced, combining the tree search of MCTS with the sequence-based optimization of EAs to tackle turn-based multi-action adversarial games.
Reinforcement Learning with Parameterized Actions
TLDR
The Q-PAMDP algorithm for learning in Markov decision processes with parameterized actions with continuous parameters is introduced, shown that it converges to a local optimum, and compared to direct policy search in the goal-scoring and Platform domains.
The StarCraft Multi-Agent Challenge
TLDR
The StarCraft Multi-Agent Challenge (SMAC), based on the popular real-time strategy game StarCraft II, is proposed as a benchmark problem and an open-source deep multi-agent RL learning framework including state-of-the-art algorithms is opened.
...
...