Corpus ID: 198897852

Terminal Prediction as an Auxiliary Task for Deep Reinforcement Learning

  title={Terminal Prediction as an Auxiliary Task for Deep Reinforcement Learning},
  author={Bilal Kartal and Pablo Hernandez-Leal and Matthew E. Taylor},
Deep reinforcement learning has achieved great successes in recent years, but there are still open challenges, such as convergence to locally optimal policies and sample inefficiency. [...] Key Result Our results on Atari games and the BipedalWalker domain suggest that A3C-TP outperforms standard A3C in most of the tested domains and in others it has similar performance.Expand
MERL: Multi-Head Reinforcement Learning
The proposed MERL, a general framework for structuring reinforcement learning by injecting problem knowledge into policy gradient updates, is introduced and defined, and the multi-head reinforcement learning framework used throughout this work is introduced. Expand
Work in Progress: Temporally Extended Auxiliary Tasks
The overall conclusions are that TD-AE increases the robustness of the A2C algorithm to the trajectory length and while promising, further study is required to fully understand the relationship between auxiliary task prediction timescale and the agent's performance. Expand
Policy gradient algorithms have proven to be successful in diverse decision making and control tasks. However, these methods suffer from high sample complexity and instability issues. In this paper,Expand
Ensemble and Auxiliary Tasks for Data-Efficient Deep Reinforcement Learning
A refined bias-variance-covariance decomposition is derived to analyze the different ways of learning ensembles and using auxiliary tasks, and use the analysis to help provide some understanding of the case study on ATARI games under limited data constraint. Expand
Learning State Representations from Random Deep Action-conditional Predictions
This work shows that deep action-conditional TD networks with random structures that create random prediction-questions about random features yield state representations that are competitive with state-of-the-art hand-crafted value prediction and pixel control auxiliary tasks in both Atari games and DeepMind Lab tasks. Expand
Is Standard Deviation the New Standard? Revisiting the Critic in Deep Policy Gradients
This work builds on recent studies indicating that traditional actor-critic algorithms do not succeed in fitting the true value function, calling for the need to identify a better objective for the critic. Expand
Efficient Searching With MCTS and Imitation Learning: A Case Study in Pommerman
An efficient reinforcement learning approach that uses a more efficient Monte Carlo tree search combined with action pruning and flexible imitation learning to accelerate the search performance, allowing the agent to avoid meaningless explorations and find some high-level strategies. Expand
On The Effect of Auxiliary Tasks on Representation Dynamics
This work develops the understanding of the relationship between auxiliary tasks, environment structure, and representations by analysing the dynamics of temporal difference algorithms by establishing a connection between the spectral decomposition of the transition operator and the representations induced by a variety of auxiliary tasks. Expand
Multimodal Deep Reinforcement Learning with Auxiliary Task for Obstacle Avoidance of Indoor Mobile Robot
In MDRLAT, a powerful bilinear fusion module is proposed to fully capture the complementary information from two-dimensional laser range findings and depth images, and the generated multimodal representation is fed into dueling double deep Q-network to output control commands for mobile robot. Expand


Reinforcement Learning with Unsupervised Auxiliary Tasks
This paper significantly outperforms the previous state-of-the-art on Atari, averaging 880\% expert human performance, and a challenging suite of first-person, three-dimensional \emph{Labyrinth} tasks leading to a mean speedup in learning of 10$\times$ and averaging 87\% Expert human performance on Labyrinth. Expand
A Deep Learning Approach for Joint Video Frame and Reward Prediction in Atari Games
This work extends a recently developed deep neural network for video frame prediction in Atari games to enable reward prediction as well, and phrases a joint optimization problem for minimizing both video frame and reward reconstruction loss, and adapt network parameters accordingly. Expand
A survey and critique of multiagent deep reinforcement learning
A clear overview of current multiagent deep reinforcement learning (MDRL) literature is provided to help unify and motivate future research to take advantage of the abundant literature that exists in a joint effort to promote fruitful research in the multiagent community. Expand
Deep Reinforcement Learning: A Brief Survey
This survey will cover central algorithms in deep RL, including the deep Q-network (DQN), trust region policy optimization (TRPO), and asynchronous advantage actor critic, and highlight the unique advantages of deep neural networks, focusing on visual understanding via RL. Expand
Towards Sample Efficient Reinforcement Learning
The understanding of the problem is shared, possible ways to alleviate the sample cost of reinforcement learning are discussed, from the aspects of exploration, optimization, environment modeling, experience transfer, and abstraction. Expand
Is multiagent deep reinforcement learning the answer or the question? A brief survey
This article provides a clear overview of current multiagent deep reinforcement learning (MDRL) literature and provides guidelines to complement this emerging area by showcasing examples on how methods and algorithms from DRL and multiagent learning (MAL) have helped solve problems in MDRL and providing general lessons learned from these works. Expand
Playing FPS Games with Deep Reinforcement Learning
This paper presents the first architecture to tackle 3D environments in first-person shooter games, that involve partially observable states, and substantially outperforms built-in AI agents of the game as well as humans in deathmatch scenarios. Expand
Loss is its own Reward: Self-Supervision for Reinforcement Learning
This work considers a range of self-supervised tasks that incorporate states, actions, and successors to provide auxiliary losses that offer ubiquitous and instantaneous supervision for representation learning even in the absence of reward. Expand
Human-level control through deep reinforcement learning
This work bridges the divide between high-dimensional sensory inputs and actions, resulting in the first artificial agent that is capable of learning to excel at a diverse array of challenging tasks. Expand
An Introduction to Deep Reinforcement Learning
This manuscript provides an introduction to deep reinforcement learning models, algorithms and techniques and particular focus is on the aspects related to generalization and how deep RL can be used for practical applications. Expand