Action Guidance: Getting the Best of Sparse Rewards and Shaped Rewards for Real-time Strategy Games
@article{Huang2020ActionGG, title={Action Guidance: Getting the Best of Sparse Rewards and Shaped Rewards for Real-time Strategy Games}, author={Shengyi Huang and Santiago Ontan'on}, journal={ArXiv}, year={2020}, volume={abs/2010.03956} }
Training agents using Reinforcement Learning in games with sparse rewards is a challenging problem, since large amounts of exploration are required to retrieve even the first reward. To tackle this problem, a common approach is to use reward shaping to help exploration. However, an important drawback of reward shaping is that agents sometimes learn to optimize the shaped reward instead of the true objective. In this paper, we present a novel technique that we call action guidance that…
5 Citations
Policy Fusion for Adaptive and Customizable Reinforcement Learning Agents
- Computer Science2021 IEEE Conference on Games (CoG)
- 2021
This article proposes four different policy fusion methods for combining pre-trained policies and demonstrates how these methods can be used in combination with Inverse Reinforcement Learning in order to create intelligent agents with specific behavioral styles as chosen by game designers, without having to define many and possibly poorly-designed reward functions.
MAIDRL: Semi-centralized Multi-Agent Reinforcement Learning using Agent Influence
- Computer Science2021 IEEE Conference on Games (CoG)
- 2021
A novel semi-centralized deep reinforcement learning algorithm for mixed cooperative and competitive multi-agent environments with robust DenseNet-style actor-critic structured deep neural network for controlling multiple agents based on the combination of local observation and abstracted global information to compete with opponent agents.
MARL-Based Dual Reward Model on Segmented Actions for Multiple Mobile Robots in Automated Warehouse Environment
- BusinessApplied Sciences
- 2022
The simple and labor-intensive tasks of workers on the job site are rapidly becoming digital. In the work environment of logistics warehouses and manufacturing plants, moving goods to a designated…
Transfer Dynamics in Emergent Evolutionary Curricula
- BiologyIEEE Transactions on Games
- 2022
The main question addressed is how the open-ended learning actually works, focusing in particular on the role of transfer of policies from one evolutionary branch (“species”) to another, and the most insightful finding is that inter-species transfer is crucial to the system’s success.
References
SHOWING 1-10 OF 37 REFERENCES
Learning by Playing - Solving Sparse Reward Tasks from Scratch
- Computer ScienceICML
- 2018
The key idea behind the method is that active (learned) scheduling and execution of auxiliary policies allows the agent to efficiently explore its environment - enabling it to excel at sparse reward RL.
Action Space Shaping in Deep Reinforcement Learning
- Computer Science2020 IEEE Conference on Games (CoG)
- 2020
The results show how domain-specific removal of actions and discretization of continuous actions can be crucial for successful learning.
Hindsight Experience Replay
- Computer ScienceNIPS
- 2017
A novel technique is presented which allows sample-efficient learning from rewards which are sparse and binary and therefore avoid the need for complicated reward engineering and may be seen as a form of implicit curriculum.
Mastering Complex Control in MOBA Games with Deep Reinforcement Learning
- Computer ScienceAAAI
- 2020
A deep reinforcement learning framework to tackle the problem of complex action control in the Multi-player Online Battle Arena (MOBA) 1v1 games is presented, which is of low coupling and high scalability, which enables efficient explorations at large scale.
Large-Scale Study of Curiosity-Driven Learning
- Computer ScienceICLR
- 2019
This paper performs the first large-scale study of purely curiosity-driven learning, i.e. without any extrinsic rewards, across 54 standard benchmark environments, including the Atari game suite, and shows surprisingly good performance.
On Reinforcement Learning for Full-length Game of StarCraft
- Computer ScienceAAAI
- 2019
A hierarchical approach, where the hierarchy involves two levels of abstraction, which can reduce the action space in an order of magnitude yet remain effective and a curriculum transfer learning approach that trains the agent from the simplest opponent to harder ones.
StarCraft II: A New Challenge for Reinforcement Learning
- Computer ScienceArXiv
- 2017
This paper introduces SC2LE (StarCraft II Learning Environment), a reinforcement learning environment based on the StarCraft II game that offers a new and challenging environment for exploring deep reinforcement learning algorithms and architectures and gives initial baseline results for neural networks trained from this data to predict game outcomes and player actions.
Comparing Observation and Action Representations for Deep Reinforcement Learning in MicroRTS
- Computer ScienceArXiv
- 2019
A preliminary study comparing different observation and action space representations for Deep Reinforcement Learning (DRL) in the context of Real-time Strategy (RTS) games shows that the local representation seems to outperform the global representation when training agents with the task of harvesting resources.
Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping
- Computer ScienceICML
- 1999
Conditions under which modi cations to the reward function of a Markov decision process preserve the op timal policy are investigated to shed light on the practice of reward shap ing a method used in reinforcement learn ing whereby additional training rewards are used to guide the learning agent.
Apprenticeship learning via inverse reinforcement learning
- Computer ScienceICML
- 2004
This work thinks of the expert as trying to maximize a reward function that is expressible as a linear combination of known features, and gives an algorithm for learning the task demonstrated by the expert, based on using "inverse reinforcement learning" to try to recover the unknown reward function.