• Corpus ID: 53113742

Parametrized Deep Q-Networks Learning: Reinforcement Learning with Discrete-Continuous Hybrid Action Space

  title={Parametrized Deep Q-Networks Learning: Reinforcement Learning with Discrete-Continuous Hybrid Action Space},
  author={Jiechao Xiong and Qing Wang and Zhuoran Yang and Peng Sun and Lei Han and Yang Zheng and Haobo Fu and T. Zhang and Ji Liu and Han Liu},
Most existing deep reinforcement learning (DRL) frameworks consider either discrete action space or continuous action space solely. Motivated by applications in computer games, we consider the scenario with discrete-continuous hybrid action space. To handle hybrid action space, previous works either approximate the hybrid space by discretization, or relax it into a continuous set. In this paper, we propose a parametrized deep Q-network (P- DQN) framework for the hybrid action space without… 

Figures and Tables from this paper

HyAR: Addressing Discrete-Continuous Action Reinforcement Learning via Hybrid Action Representation

This paper proposes Hybrid Action Representation (HyAR) to learn a compact and decodable latent representation space for the original hybrid action space and demonstrates the superiority of HyAR when compared with previous baselines, especially for high-dimensional action spaces.

Deep Multi-Agent Reinforcement Learning with Discrete-Continuous Hybrid Action Spaces

The empirical results show that both Deep MAPQN and Deep MAHHQN are effective and significantly outperform existing independent deep parameterized Q-learning method.

Multi-Pass Q-Networks for Deep Reinforcement Learning with Parameterised Action Spaces

It is empirically demonstrated that MP-DQN significantly outperforms P-D QN and other previous algorithms in terms of data efficiency and converged policy performance on the Platform, Robot Soccer Goal, and Half Field Offense domains.

An Overview of the Action Space for Deep Reinforcement Learning

The differences and connections between discrete action space, continuous action space and discrete-continuous hybrid action space are analyzed, and various reinforcement learning algorithms suitable for different action spaces are elaborate.


This work proposes adopting a different parametrization scheme for state–action value networks based on neural ordinary differential equations (NODEs) as a scalable, plug–and–play approach for parametrized action spaces.

Hybrid Actor-Critic Reinforcement Learning in Parameterized Action Space

In this paper we propose a hybrid architecture of actor-critic algorithms for reinforcement learning in parameterized action space, which consists of multiple parallel sub-actor networks to decompose

Discrete and Continuous Action Representation for Practical RL in Video Games

It is shown that Hybrid SAC can successfully solve a highspeed driving task in one of the authors' games, and is competitive with the state-of-the-art on parameterized actions benchmark tasks.

Exploration in Deep Reinforcement Learning: From Single-Agent to Multi-Agent Domain

A comprehensive survey on existing exploration methods for both single-agent and multi-agent RL, identifying several key challenges to efficient exploration and point out a few future directions.

Hierarchical Advantage for Reinforcement Learning in Parameterized Action Space

The hierarchical architecture of the advantage function, which is referred to as the hierarchical advantage, helps to stabilize the learning and leads to a better performance in reinforcement learning in parameterized action space.

Distributed Reinforcement Learning with Self-Play in Parameterized Action Space

A distributed self-play training framework for an extended proximal policy optimization (PPO) algorithm that learns to act in parameterized action space and plays against a group of opponents, i.e., a league.



Deep Reinforcement Learning in Parameterized Action Space

This paper represents a successful extension of deep reinforcement learning to the class of parameterized action space MDPs within the domain of simulated RoboCup soccer, which features a small set of discrete action types each of which is parameterized with continuous variables.

Continuous control with deep reinforcement learning

This work presents an actor-critic, model-free algorithm based on the deterministic policy gradient that can operate over continuous action spaces, and demonstrates that for many of the tasks the algorithm can learn policies end-to-end: directly from raw pixel inputs.

Continuous Deep Q-Learning with Model-based Acceleration

This paper derives a continuous variant of the Q-learning algorithm, which it is called normalized advantage functions (NAF), as an alternative to the more commonly used policy gradient and actor-critic methods, and substantially improves performance on a set of simulated robotic control tasks.

Active Exploration and Parameterized Reinforcement Learning Applied to a Simulated Human-Robot Interaction Task

This work proposes an active exploration algorithm for RL in structured (parameterized) continuous action space and shows that it outperforms continuous parameterized RL both without active exploration and with active exploration based on uncertainty variations measured by a Kalman-Q-learning algorithm.

Deep Reinforcement Learning with Double Q-Learning

This paper proposes a specific adaptation to the DQN algorithm and shows that the resulting algorithm not only reduces the observed overestimations, as hypothesized, but that this also leads to much better performance on several games.

Dueling Network Architectures for Deep Reinforcement Learning

This paper presents a new neural network architecture for model-free reinforcement learning that leads to better policy evaluation in the presence of many similar-valued actions and enables the RL agent to outperform the state-of-the-art on the Atari 2600 domain.

Asynchronous Methods for Deep Reinforcement Learning

A conceptually simple and lightweight framework for deep reinforcement learning that uses asynchronous gradient descent for optimization of deep neural network controllers and shows that asynchronous actor-critic succeeds on a wide variety of continuous motor control problems as well as on a new task of navigating random 3D mazes using a visual input.

Playing Atari with Deep Reinforcement Learning

This work presents the first deep learning model to successfully learn control policies directly from high-dimensional sensory input using reinforcement learning, which outperforms all previous approaches on six of the games and surpasses a human expert on three of them.

Human-level control through deep reinforcement learning

This work bridges the divide between high-dimensional sensory inputs and actions, resulting in the first artificial agent that is capable of learning to excel at a diverse array of challenging tasks.

Averaged-DQN: Variance Reduction and Stabilization for Deep Reinforcement Learning

Averaged-DQN is a simple extension to the DQN algorithm, based on averaging previously learned Q-values estimates, which leads to a more stable training procedure and improved performance by reducing approximation error variance in the target values.