• Corpus ID: 5389801

Dueling Network Architectures for Deep Reinforcement Learning

  title={Dueling Network Architectures for Deep Reinforcement Learning},
  author={Ziyun Wang and Tom Schaul and Matteo Hessel and H. V. Hasselt and Marc Lanctot and Nando de Freitas},
In recent years there have been many successes of using deep representations in reinforcement learning. [] Key Method Our dueling network represents two separate estimators: one for the state value function and one for the state-dependent action advantage function. The main benefit of this factoring is to generalize learning across actions without imposing any change to the underlying reinforcement learning algorithm. Our results show that this architecture leads to better policy evaluation in the presence…

Figures and Tables from this paper

A State Representation Dueling Network for Deep Reinforcement Learning

  • Haomin QiuF. Liu
  • Computer Science
    2020 IEEE 32nd International Conference on Tools with Artificial Intelligence (ICTAI)
  • 2020
A state representation dueling network is introduced, which provides an auxiliary task designed to be combined with other reinforcement learning algorithms to improve the performance of Deep RL.

Deep Reinforcement Learning with Hidden Layers on Future States

This work proposes a method that predicts future states using Long Short Term Memory (LSTM), such that the agent can look ahead without the emulator, and applies this method to the asynchronous advantage actor-critic (A3C) architecture.

Group Equivariant Deep Reinforcement Learning

It is demonstrated that equivariant architectures can dramatically enhance the performance and sample efficiency of RL agents in a highly symmetric environment while requiring fewer parameters and are robust to changes in the environment caused by affine transformations.

Compression and Localization in Reinforcement Learning for ATARI Games

This work compress networks to drastically reduce the number of parameters in them, and applies a global max pool after the final convolution layer, which allows for weakly supervised object localization, improving the ability to identify the agent's points of focus.

Deep Reinforcement Learning With Macro-Actions

This paper focuses on macro-actions, and evaluates these on different Atari 2600 games, where they yield significant improvements in learning speed and can even achieve better scores than DQN.

Biologically inspired architectures for sample-efficient deep reinforcement learning

This work shows empirically that in the low-data regime, it is possible to learn online policies with 2 to 10 times less total coefficients, with little to no loss of performance.

Action Branching Architectures for Deep Reinforcement Learning

The empirical results show that the proposed agent scales gracefully to environments with increasing action dimensionality and indicate the significance of the shared decision module in coordination of the distributed action branches.

Shallow Updates for Deep Reinforcement Learning

This work proposes a hybrid approach -- the Least Squares Deep Q-Network (LS-DQN), which combines rich feature representations learned by a DRL algorithm with the stability of a linear least squares method by periodically re-training the last hidden layer of a D RL network with a batch least squares update.

Value Prediction Network

This paper proposes a novel deep reinforcement learning architecture, called Value Prediction Network (VPN), which integrates model-free and model-based RL methods into a single neural network, which outperforms Deep Q-Network on several Atari games even with short-lookahead planning.

Distributed Deep Reinforcement Learning: An Overview

A survey of the role of the distributed approaches inDeep reinforcement learning, by studying the key research works that have a significant impact on how to use distributed methods in DRL and evaluating these methods on different tasks and comparing their performance with each other and with single actor and learner agents.



Massively Parallel Methods for Deep Reinforcement Learning

This work presents the first massively distributed architecture for deep reinforcement learning, using a distributed neural network to represent the value function or behaviour policy, and a distributed store of experience to implement the Deep Q-Network algorithm.

Deep Reinforcement Learning with Double Q-Learning

This paper proposes a specific adaptation to the DQN algorithm and shows that the resulting algorithm not only reduces the observed overestimations, as hypothesized, but that this also leads to much better performance on several games.

Human-level control through deep reinforcement learning

This work bridges the divide between high-dimensional sensory inputs and actions, resulting in the first artificial agent that is capable of learning to excel at a diverse array of challenging tasks.

Incentivizing Exploration In Reinforcement Learning With Deep Predictive Models

This paper considers the challenging Atari games domain, and proposes a new exploration method based on assigning exploration bonuses from a concurrently learned model of the system dynamics that provides the most consistent improvement across a range of games that pose a major challenge for prior methods.

Advances in optimizing recurrent networks

Experiments reported here evaluate the use of clipping gradients, spanning longer time ranges with leaky integration, advanced momentum techniques, using more powerful output probability models, and encouraging sparser gradients to help symmetry breaking and credit assignment.

Reinforcement learning for robots using neural networks

This dissertation concludes that it is possible to build artificial agents than can acquire complex control policies effectively by reinforcement learning and enable its applications to complex robot-learning problems.

End-to-End Training of Deep Visuomotor Policies

This paper develops a method that can be used to learn policies that map raw image observations directly to torques at the robot's motors, trained using a partially observed guided policy search method, with supervision provided by a simple trajectory-centric reinforcement learning method.

Prioritized Experience Replay

A framework for prioritizing experience, so as to replay important transitions more frequently, and therefore learn more efficiently, in Deep Q-Networks, a reinforcement learning algorithm that achieved human-level performance across many Atari games.

Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Search Planning

The central idea is to use the slow planning-based agents to provide training data for a deep-learning architecture capable of real-time play, and proposed new agents based on this idea are proposed and shown to outperform DQN.

Multi-Agent Residual Advantage Learning with General Function Approximation.

A new algorithm, Incremental Delta- Delta (IDD), is presented, which extends Jacob's (1988) Delta-Delta for use in incremental training, and differs from Sutton's IncrementalDelta-Bar-Delta in that it does not require the use of a trace and is amenable for use with general function approximation systems.