Corpus ID: 51881964

Transfer in Deep Reinforcement Learning Using Successor Features and Generalised Policy Improvement

@article{Barreto2018TransferID,
  title={Transfer in Deep Reinforcement Learning Using Successor Features and Generalised Policy Improvement},
  author={Andr{\'e} Barreto and Diana Borsa and John Quan and Tom Schaul and David Silver and Matteo Hessel and Daniel Jaymin Mankowitz and Augustin Z{\'i}dek and R{\'e}mi Munos},
  journal={ArXiv},
  year={2018},
  volume={abs/1901.10964}
}
The ability to transfer skills across tasks has the potential to scale up reinforcement learning (RL) agents to environments currently out of reach. Recently, a framework based on two ideas, successor features (SFs) and generalised policy improvement (GPI), has been introduced as a principled way of transferring skills. In this paper we extend the SFs & GPI framework in two ways. One of the basic assumptions underlying the original formulation of SFs & GPI is that rewards for all tasks of… Expand
Universal Successor Features for Transfer Reinforcement Learning
TLDR
This paper proposes Universal Successor Features (USFs) to capture the underlying dynamics of the environment while allowing generalization to unseen goals and proposes a flexible end-to-end model of USFs that can be trained by interacting with the environment. Expand
Universal Successor Features Approximators
TLDR
This work discusses the challenges involved in training a USFA, its generalisation properties and demonstrates its practical benefits and transfer abilities on a large-scale domain in which the agent has to navigate in a first-person perspective three-dimensional environment. Expand
Entropic Policy Composition with Generalized Policy Improvement and Divergence Correction
TLDR
This work extends generalized policy improvement to the max-entropy framework and introduces a method for the practical implementation of successor features in continuous action spaces, and proposes a novel approach which, in principle, recovers the optimal policy during transfer. Expand
Xi-Learning: Successor Feature Transfer Learning for General Reward Functions
TLDR
This work proposes a novel SF mechanism, ξlearning, based on learning the cumulative discounted probability of successor features, and introduces two ξ-learning variations, proves its convergence, and provides a guarantee on its transfer performance. Expand
Fast reinforcement learning with generalized policy updates
TLDR
It is argued that complex decision problems can be naturally decomposed into multiple tasks that unfold in sequence or in parallel, and associating each task with a reward function can be seamlessly accommodated within the standard reinforcement-learning formalism. Expand
State2vec: Off-Policy Successor Features Approximators
A major challenge in reinforcement learning (RL) is the design of agents that are able to generalize across tasks that share common dynamics. A viable solution is meta-reinforcement learning, whichExpand
Sequential Transfer in Reinforcement Learning with a Generative Model
TLDR
This work designs an algorithm that quickly identifies an accurate solution by seeking the state-action pairs that are most informative for this purpose and derives PAC bounds on its sample complexity which clearly demonstrate the benefits of using this kind of prior knowledge. Expand
Successor Feature Neural Episodic Control
TLDR
A combination of both episodic control and successor features in a single reinforcement learning framework is outlined and empirically illustrate its benefits. Expand
Risk-Aware Transfer in Reinforcement Learning using Successor Features
TLDR
This paper addresses the problem of risk-aware policy transfer between tasks in a common domain that differ only in their reward functions, in which risk is measured by the variance of reward streams, and develops risk-awareness features that integrate seamlessly within the RL framework and inherit the superior task generalization ability of SFs. Expand
A New Representation of Successor Features for Transfer across Dissimilar Environments
TLDR
An approach based on successor features in which it is shown that the convergence of this approach as well as the bounded error on modelling successor feature functions with Gaussian Processes in environments with both different dynamics and rewards is convergence. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 36 REFERENCES
Successor Features for Transfer in Reinforcement Learning
TLDR
This work proposes a transfer framework for the scenario where the reward function changes between tasks but the environment's dynamics remain the same, and derives two theorems that set the approach in firm theoretical ground and presents experiments that show that it successfully promotes transfer in practice. Expand
Deep Successor Reinforcement Learning
TLDR
DSR is presented, which generalizes Successor Representations within an end-to-end deep reinforcement learning framework and has several appealing properties including: increased sensitivity to distal reward changes due to factorization of reward and world dynamics, and the ability to extract bottleneck states given successor maps trained under a random policy. Expand
IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures
TLDR
A new distributed agent IMPALA (Importance Weighted Actor-Learner Architecture) is developed that not only uses resources more efficiently in single-machine training but also scales to thousands of machines without sacrificing data efficiency or resource utilisation. Expand
Policy transfer via modularity and reward guiding
TLDR
This work explores the use of reinforcement learning to train a robot to robustly push an object and proposes to use modularity to separate the learned policy from the raw inputs and outputs; rather than training “end-to-end,” the system is decompose into modules and train only a subset of these modules in simulation. Expand
FeUdal Networks for Hierarchical Reinforcement Learning
We introduce FeUdal Networks (FuNs): a novel architecture for hierarchical reinforcement learning. Our approach is inspired by the feudal reinforcement learning proposal of Dayan and Hinton, andExpand
Zero-Shot Task Generalization with Multi-Task Deep Reinforcement Learning
TLDR
A new RL problem where the agent should learn to execute sequences of instructions after learning useful skills that solve subtasks is introduced and a new neural architecture in the meta controller that learns when to update the subtask is proposed, which makes learning more efficient. Expand
Deep reinforcement learning with successor features for navigation across similar environments
TLDR
This paper proposes a successor-feature-based deep reinforcement learning algorithm that can learn to transfer knowledge from previously mastered navigation tasks to new problem instances and substantially decreases the required learning time after the first task instance has been solved. Expand
The Option-Critic Architecture
TLDR
This work derives policy gradient theorems for options and proposes a new option-critic architecture capable of learning both the internal policies and the termination conditions of options, in tandem with the policy over options, and without the need to provide any additional rewards or subgoals. Expand
Distral: Robust multitask reinforcement learning
TLDR
This work proposes a new approach for joint training of multiple tasks, which it refers to as Distral (Distill & transfer learning), and shows that the proposed learning process is more robust and more stable---attributes that are critical in deep reinforcement learning. Expand
Transfer in Reinforcement Learning: A Framework and a Survey
  • A. Lazaric
  • Computer Science
  • Reinforcement Learning
  • 2012
TLDR
This chapter provides a formalization of the general transfer problem, the main settings which have been investigated so far, and the most important approaches to transfer in reinforcement learning. Expand
...
1
2
3
4
...