• Corpus ID: 5389801

Dueling Network Architectures for Deep Reinforcement Learning

@article{Wang2016DuelingNA,
  title={Dueling Network Architectures for Deep Reinforcement Learning},
  author={Ziyun Wang and Tom Schaul and Matteo Hessel and H. V. Hasselt and Marc Lanctot and Nando de Freitas},
  journal={ArXiv},
  year={2016},
  volume={abs/1511.06581}
}
In recent years there have been many successes of using deep representations in reinforcement learning. [] Key Method Our dueling network represents two separate estimators: one for the state value function and one for the state-dependent action advantage function. The main benefit of this factoring is to generalize learning across actions without imposing any change to the underlying reinforcement learning algorithm. Our results show that this architecture leads to better policy evaluation in the presence…

Figures and Tables from this paper

A State Representation Dueling Network for Deep Reinforcement Learning
  • Haomin Qiu, F. Liu
  • Computer Science
    2020 IEEE 32nd International Conference on Tools with Artificial Intelligence (ICTAI)
  • 2020
TLDR
A state representation dueling network is introduced, which provides an auxiliary task designed to be combined with other reinforcement learning algorithms to improve the performance of Deep RL.
Deep Reinforcement Learning with Hidden Layers on Future States
TLDR
This work proposes a method that predicts future states using Long Short Term Memory (LSTM), such that the agent can look ahead without the emulator, and applies this method to the asynchronous advantage actor-critic (A3C) architecture.
Group Equivariant Deep Reinforcement Learning
TLDR
It is demonstrated that equivariant architectures can dramatically enhance the performance and sample efficiency of RL agents in a highly symmetric environment while requiring fewer parameters and are robust to changes in the environment caused by affine transformations.
Compression and Localization in Reinforcement Learning for ATARI Games
TLDR
This work compress networks to drastically reduce the number of parameters in them, and applies a global max pool after the final convolution layer, which allows for weakly supervised object localization, improving the ability to identify the agent's points of focus.
Deep Reinforcement Learning With Macro-Actions
TLDR
This paper focuses on macro-actions, and evaluates these on different Atari 2600 games, where they yield significant improvements in learning speed and can even achieve better scores than DQN.
Biologically inspired architectures for sample-efficient deep reinforcement learning
TLDR
This work shows empirically that in the low-data regime, it is possible to learn online policies with 2 to 10 times less total coefficients, with little to no loss of performance.
Action Branching Architectures for Deep Reinforcement Learning
TLDR
The empirical results show that the proposed agent scales gracefully to environments with increasing action dimensionality and indicate the significance of the shared decision module in coordination of the distributed action branches.
Shallow Updates for Deep Reinforcement Learning
TLDR
This work proposes a hybrid approach -- the Least Squares Deep Q-Network (LS-DQN), which combines rich feature representations learned by a DRL algorithm with the stability of a linear least squares method by periodically re-training the last hidden layer of a D RL network with a batch least squares update.
Value Prediction Network
TLDR
This paper proposes a novel deep reinforcement learning architecture, called Value Prediction Network (VPN), which integrates model-free and model-based RL methods into a single neural network, which outperforms Deep Q-Network on several Atari games even with short-lookahead planning.
Distributed Deep Reinforcement Learning: An Overview
TLDR
A survey of the role of the distributed approaches inDeep reinforcement learning, by studying the key research works that have a significant impact on how to use distributed methods in DRL and evaluating these methods on different tasks and comparing their performance with each other and with single actor and learner agents.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 30 REFERENCES
Massively Parallel Methods for Deep Reinforcement Learning
TLDR
This work presents the first massively distributed architecture for deep reinforcement learning, using a distributed neural network to represent the value function or behaviour policy, and a distributed store of experience to implement the Deep Q-Network algorithm.
Deep Reinforcement Learning with Double Q-Learning
TLDR
This paper proposes a specific adaptation to the DQN algorithm and shows that the resulting algorithm not only reduces the observed overestimations, as hypothesized, but that this also leads to much better performance on several games.
Human-level control through deep reinforcement learning
TLDR
This work bridges the divide between high-dimensional sensory inputs and actions, resulting in the first artificial agent that is capable of learning to excel at a diverse array of challenging tasks.
Incentivizing Exploration In Reinforcement Learning With Deep Predictive Models
TLDR
This paper considers the challenging Atari games domain, and proposes a new exploration method based on assigning exploration bonuses from a concurrently learned model of the system dynamics that provides the most consistent improvement across a range of games that pose a major challenge for prior methods.
Advances in optimizing recurrent networks
TLDR
Experiments reported here evaluate the use of clipping gradients, spanning longer time ranges with leaky integration, advanced momentum techniques, using more powerful output probability models, and encouraging sparser gradients to help symmetry breaking and credit assignment.
Reinforcement learning for robots using neural networks
TLDR
This dissertation concludes that it is possible to build artificial agents than can acquire complex control policies effectively by reinforcement learning and enable its applications to complex robot-learning problems.
End-to-End Training of Deep Visuomotor Policies
TLDR
This paper develops a method that can be used to learn policies that map raw image observations directly to torques at the robot's motors, trained using a partially observed guided policy search method, with supervision provided by a simple trajectory-centric reinforcement learning method.
Prioritized Experience Replay
TLDR
A framework for prioritizing experience, so as to replay important transitions more frequently, and therefore learn more efficiently, in Deep Q-Networks, a reinforcement learning algorithm that achieved human-level performance across many Atari games.
Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Search Planning
TLDR
The central idea is to use the slow planning-based agents to provide training data for a deep-learning architecture capable of real-time play, and proposed new agents based on this idea are proposed and shown to outperform DQN.
Multi-Agent Residual Advantage Learning with General Function Approximation.
TLDR
A new algorithm, Incremental Delta- Delta (IDD), is presented, which extends Jacob's (1988) Delta-Delta for use in incremental training, and differs from Sutton's IncrementalDelta-Bar-Delta in that it does not require the use of a trace and is amenable for use with general function approximation systems.
...
1
2
3
...