• Corpus ID: 23556038

Deep Abstract Q-Networks

@inproceedings{Roderick2018DeepAQ,
  title={Deep Abstract Q-Networks},
  author={Melrose Roderick and Christopher Grimm and Stefanie Tellex},
  booktitle={AAMAS},
  year={2018}
}
We examine the problem of learning and planning on high-dimensional domains with long horizons and sparse rewards. Recent approaches have shown great successes in many Atari 2600 domains. However, domains with long horizons and sparse rewards, such as Montezuma's Revenge and Venture, remain challenging for existing methods. Methods using abstraction (Dietterich 2000; Sutton, Precup, and Singh 1999) have shown to be useful in tackling long-horizon problems. We combine recent techniques of deep… 

Figures from this paper

SAGE: Generating Symbolic Goals for Myopic Models in Deep Reinforcement Learning

TLDR
This work proposes SAGE, an algorithm combining learning and planning to exploit a previously unusable class of incomplete models that outperforms competing methods on variations of taxi world and Minecraft.

Hierarchical Imitation and Reinforcement Learning

TLDR
This work proposes an algorithmic framework, called hierarchical guidance, that leverages the hierarchical structure of the underlying problem to integrate different modes of expert interaction and can incorporate different combinations of imitation learning and reinforcement learning at different levels, leading to dramatic reductions in both expert effort and cost of exploration.

Learning Abstract Models for Strategic Exploration and Fast Reward Transfer

TLDR
This work constructs an abstract Markov Decision Process, which grows through strategic exploration via planning and is backed by learned subpolicies that navigate between abstract states, and achieves strong results on three of the hardest Arcade Learning Environment games.

Abstract Value Iteration for Hierarchical Reinforcement Learning

TLDR
This work proposes a novel hierarchical reinforcement learning framework for control with continuous state and action spaces and proposes two algorithms for planning in the ADP, a practical one that interweaves planning at the abstract level and learning at the concrete level.

Programmatic Reinforcement Learning without Oracles

TLDR
This work proposes a programmatically interpretable RL framework that conducts program architecture search on top of a continuous relaxation of the architecture space defined by programming language grammar rules, and allows policy architectures to be learned with policy parameters via bilevel optimization using efficient policy-gradient methods, and thus does not require a pretrained oracle.

Discrete State-Action Abstraction via the Successor Representation

TLDR
The proposed algorithm, Discrete State-Action Abstraction (DSAA), iteratively swaps between training a set of temporally extended actions in the form of options, i.e., an action abstraction of the underlying environment, and is able to explore the environment and solve provided tasks more e-ciently than baseline reinforcement learning algorithms.

Fast Exploration with Simplified Models and Approximately Optimistic Planning in Model Based Reinforcement Learning

TLDR
A novel algorithm is introduced, Strategic Object Oriented Reinforcement Learning (SOORL), that outperforms state-of-the-art algorithms in the game of Pitfall! in less than 50 episodes.

Deriving Subgoals Autonomously to Accelerate Learning in Sparse Reward Domains

TLDR
This work describes a new, autonomous approach for deriving subgoals from raw pixels that is more efficient than competing methods, and proposes a novel intrinsic reward scheme for exploiting the derivedSubgoals, applying it to three Atari games with sparse rewards.

Self-Imitation Learning via Trajectory-Conditioned Policy for Hard-Exploration Tasks

TLDR
This work proposes a new method of learning a trajectory-conditioned policy to imitate diverse trajectories from the agent's own past experience and shows that such self-imitation helps avoid myopic behavior and increases the chance of finding a globally optimal solution for hard-exploration tasks, especially when there are misleading rewards.

Uncertainty-sensitive Learning and Planning with Ensembles

TLDR
A reinforcement learning framework for discrete environments in which an agent makes both strategic and tactical decisions through the use of value function, while the latter is powered by a tree search planner that combines uncertainty modelling and risk measurement.

References

SHOWING 1-10 OF 16 REFERENCES

Deep Reinforcement Learning with Double Q-Learning

TLDR
This paper proposes a specific adaptation to the DQN algorithm and shows that the resulting algorithm not only reduces the observed overestimations, as hypothesized, but that this also leads to much better performance on several games.

Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation

TLDR
h-DQN is presented, a framework to integrate hierarchical value functions, operating at different temporal scales, with intrinsically motivated deep reinforcement learning, and allows for flexible goal specifications, such as functions over entities and relations.

Schema Networks: Zero-shot Transfer with a Generative Causal Model of Intuitive Physics

TLDR
The Schema Network is introduced, an object-oriented generative physics simulator capable of disentangling multiple causes of events and reasoning backward through causes to achieve goals, and generalizing from limited data and learning causal relationships are essential abilities on the path toward generally intelligent systems.

Human-level control through deep reinforcement learning

TLDR
This work bridges the divide between high-dimensional sensory inputs and actions, resulting in the first artificial agent that is capable of learning to excel at a diverse array of challenging tasks.

Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning

FeUdal Networks for Hierarchical Reinforcement Learning

We introduce FeUdal Networks (FuNs): a novel architecture for hierarchical reinforcement learning. Our approach is inspired by the feudal reinforcement learning proposal of Dayan and Hinton, and

The Option-Critic Architecture

TLDR
This work derives policy gradient theorems for options and proposes a new option-critic architecture capable of learning both the internal policies and the termination conditions of options, in tandem with the policy over options, and without the need to provide any additional rewards or subgoals.

Count-Based Exploration with Neural Density Models

Bellemare et al. (2016) introduced the notion of a pseudo-count, derived from a density model, to generalize count-based exploration to non-tabular reinforcement learning. This pseudo-count was used

The Arcade Learning Environment: An Evaluation Platform for General Agents (Extended Abstract)

TLDR
The promise of ALE is illustrated by developing and benchmarking domain-independent agents designed using well-established AI techniques for both reinforcement learning and planning, and an evaluation methodology made possible by ALE is proposed.

Roles of Macro-Actions in Accelerating Reinforcement Learning

TLDR
Although eligibility traces increased the rate of convergence to the optimal value function compared to learning with macro-actions but without eligibility traces, eligibility traces did not permit the optimal policy to be learned as quickly as it was using macro- actions.