Corpus ID: 30658962

The Eigenoption-Critic Framework

  title={The Eigenoption-Critic Framework},
  author={Miao Liu and Marlos C. Machado and G. Tesauro and Murray Campbell},
Eigenoptions (EOs) have been recently introduced as a promising idea for generating a diverse set of options through the graph Laplacian, having been shown to allow efficient exploration. Despite its initial promising results, a couple of issues in current algorithms limit its application, namely: (1) EO methods require two separate steps (eigenoption discovery and reward maximization) to learn a control policy, which can incur a significant amount of storage and computation; (2) EOs are only… Expand
Variational Option Discovery Algorithms
A tight connection between variational option discovery methods and variational autoencoders is highlighted, and Variational Autoencoding Learning of Options by Reinforcement (VALOR), a new method derived from the connection is introduced, and a curriculum learning approach is proposed. Expand
Efficient Exploration in Reinforcement Learning through Time-Based Representations
In the reinforcement learning (RL) problem an agent must learn how to act optimally through trial-and-error interactions with a complex, unknown, stochastic environment. The actions taken by theExpand
Unsupervised State-space Decomposition in Hierarchical Reinforcement Learning
It would be interesting to determine whether the unsupervised goal determination converges towards the optimal solution and has any impact in the running performance of the algorithm. Expand
Model primitives for hierarchical lifelong reinforcement learning
A framework for using diverse suboptimal world models to decompose complex task solutions into simpler modular subpolicies and a series of experiments on high dimensional continuous action control tasks demonstrate the effectiveness of this approach at both complex single-task learning and lifelong learning. Expand
Optimal Options for Multi-Task Reinforcement Learning Under Time Constraints
This work directly search for optimal option sets and shows that the discovered options significantly differ depending on factors such as the available learning time budget and that the found options outperform popular option-generation heuristics. Expand
Learning Plannable Representations with Causal InfoGAN
This work asks how to imagine goal-directed visual plans – a plausible sequence of observations that transition a dynamical system from its current configuration to a desired goal state, which can later be used as a reference trajectory for control. Expand
Constraint Satisfaction Propagation: Non-stationary Policy Synthesis for Temporal Logic Planning
This work demonstrates a logic-compatible approach using model-based knowledge of environment dynamics and deadline information to directly infer non-stationary policies composed of reusable stationary policies, constructed to maximize the probability of satisfying time-sensitive goals while respecting time-varying obstacles. Expand
Skill Discovery for Exploration and Planning using Deep Skill Graphs
This work proposes a novel algorithm, Deep Skill Graphs, for acquiring a minimal representation of an environment that can be used to drive the agent to novel goals at test time, requiring little-to-no additional learning. Expand
Option Discovery using Deep Skill Chaining
It is demonstrated that deep skill chaining significantly outperforms both non-hierarchical agents and other state-of-the-art skill discovery techniques in challenging continuous control tasks. Expand
The value of abstraction
Three ways in which abstractions can guide learning are discussed: domain structure and representational simplicity, which facilitate efficient learning by guiding exploration and generalization in RL. Expand


Eigenoption Discovery through the Deep Successor Representation
This paper proposes an algorithm that discovers eigenoptions while learning non-linear state representations from raw pixels, and exploits recent successes in the deep reinforcement learning literature and the equivalence between proto-value functions and the successor representation. Expand
A Laplacian Framework for Option Discovery in Reinforcement Learning
This paper addresses the option discovery problem by showing how PVFs implicitly define options by introducing eigenpurposes, intrinsic reward functions derived from the learned representations, which traverse the principal directions of the state space. Expand
Probabilistic inference for determining options in reinforcement learning
The proposed approach is based on parametric option representations and works well in combination with current policy search methods, which are particularly well suited for continuous real-world tasks. Expand
Reinforcement Learning with Unsupervised Auxiliary Tasks
This paper significantly outperforms the previous state-of-the-art on Atari, averaging 880\% expert human performance, and a challenging suite of first-person, three-dimensional \emph{Labyrinth} tasks leading to a mean speedup in learning of 10$\times$ and averaging 87\% Expert human performance on Labyrinth. Expand
Stochastic Neural Networks for Hierarchical Reinforcement Learning
This work proposes a general framework that first learns useful skills in a pre-training environment, and then leverages the acquired skills for learning faster in downstream tasks, and uses Stochastic Neural Networks combined with an information-theoretic regularizer to efficiently pre-train a large span of skills. Expand
Proto-value Functions: A Laplacian Framework for Learning Representation and Control in Markov Decision Processes
A novel spectral framework for solving Markov decision processes (MDPs) by jointly learning representations and optimal policies is introduced, and several strategies for scaling the proposed framework to large MDPs are outlined. Expand
Human-level control through deep reinforcement learning
This work bridges the divide between high-dimensional sensory inputs and actions, resulting in the first artificial agent that is capable of learning to excel at a diverse array of challenging tasks. Expand
Policy Gradient Methods for Reinforcement Learning with Function Approximation
This paper proves for the first time that a version of policy iteration with arbitrary differentiable function approximation is convergent to a locally optimal policy. Expand
Benchmarking Deep Reinforcement Learning for Continuous Control
This work presents a benchmark suite of continuous control tasks, including classic tasks like cart-pole swing-up, tasks with very high state and action dimensionality such as 3D humanoid locomotion, task with partial observations, and tasks with hierarchical structure. Expand
Revisiting the Arcade Learning Environment: Evaluation Protocols and Open Problems for General Agents
A big picture look at how the Arcade Learning Environment is being used by the research community is taken, revisiting challenges posed when the ALE was introduced, summarizing the state-of-the-art in various problems and highlighting problems that remain open. Expand