Corpus ID: 47019263

Context-Aware Policy Reuse

@inproceedings{Li2019ContextAwarePR,
  title={Context-Aware Policy Reuse},
  author={Siyuan Li and Fangda Gu and Guangxiang Zhu and Chongjie Zhang},
  booktitle={AAMAS},
  year={2019}
}
Transfer learning can greatly speed up reinforcement learning for a new task by leveraging policies of relevant tasks. Existing works of policy reuse either focus on only selecting a single best source policy for transfer without considering contexts, or cannot guarantee to learn an optimal policy for a target task. To improve transfer efficiency and guarantee optimality, we develop a novel policy reuse method, called Context-Aware Policy reuSe (CAPS), that enables multi-policy transfer. Our… Expand
Contextual Policy Reuse using Deep Mixture Models
TLDR
A novel deep mixture model formulation for learning a state-dependent prior over source task dynamics that matches the target dynamics using only state trajectories obtained while learning the target policy. Expand
LISPR: An Options Framework for Policy Reuse with Reinforcement Learning
TLDR
This framework performs excellently in sparse reward problems given (sub-)optimal source policies and improves upon prior art in transfer methods such as continual learning and progressive networks, which lack the framework’s desirable theoretical properties. Expand
Lifetime policy reuse and the importance of task capacity
TLDR
This paper presents a first approach to lifetime-scalable policy reuse by selecting the number of policies based on task capacity, and suggests using D(R)QN for larger and PPO for smaller library sizes. Expand
MULTIPOLAR: Multi-Source Policy Aggregation for Transfer Reinforcement Learning between Diverse Environmental Dynamics
TLDR
This work explores a new challenge in transfer RL, where only a set of source policies collected under unknown diverse dynamics is available for learning a target task efficiently, and proposes a MULTI-source POLicy AggRegation (MULTIPOLAR), which learns to aggregate the actions provided by the source policies adaptively to maximize the target task performance. Expand
MULTIPOLAR: MULTI-SOURCE POLICY AGGREGA-
This work explores a new challenge in transfer reinforcement learning (RL), where only a set of source policies collected under diverse unknown dynamics is available for quickly learning a targetExpand
Efficient Use of heuristics for accelerating XCS-based Policy Learning in Markov Games
TLDR
An algorithm that can efficiently learn explainable and generalized action selection rules by taking advantages of the representation of quantitative heuristics and an opponent model with an eXtended classifier system (XCS) in zero-sum Markov games is proposed. Expand
Hierarchical Reinforcement Learning with Advantage-Based Auxiliary Rewards
TLDR
This paper proposes an HRL framework which sets auxiliary rewards for low-level skill training based on the advantage function of the high-level policy, and theoretically proves that optimizing low- level skills with this auxiliary reward will increase the task return for the joint policy. Expand
AC-Teach: A Bayesian Actor-Critic Method for Policy Learning with an Ensemble of Suboptimal Teachers
TLDR
This proposed approach - Actor-Critic with Teacher Ensembles (AC-Teach) is the first to work with an ensemble of suboptimal teachers that may solve only part of the problem or contradict other each other, forming a unified algorithmic solution that is compatible with a broad range of teacher ensembles. Expand
Discovering Generalizable Skills via Automated Generation of Diverse Tasks
TLDR
The proposed Skill Learning In Diversified Environments (SLIDE), a method to discover generalizable skills via automated generation of a diverse set of tasks, suggests that the learned skills can effectively improve the robot’s performance in various unseen target tasks compared to existing reinforcement learning and skill learning methods. Expand
State of the Art on: Transfer learning in reinforcement learning
Machine learning (ML) is a subfield of artificial intelligence whose aim is the design of algorithms able to learn from data exploiting statistical tools. In reinforcement learning (RL) an agent actsExpand
...
1
2
...

References

SHOWING 1-10 OF 46 REFERENCES
Learning domain structure through probabilistic policy reuse in reinforcement learning
TLDR
This work demonstrates that Policy Reuse further contributes to the learning of the structure of a domain, and theoretically demonstrates that, under a set of conditions to be satisfied, reusing such aSet of core-policies allows us to bound the minimal expected gain received while learning a new policy. Expand
An Optimal Online Method of Selecting Source Policies for Reinforcement Learning
TLDR
This paper develops an optimal online method to select source policies for reinforcement learning that formulates online source policy selection as a multi-armed bandit problem and augments Q-learning with policy reuse. Expand
Bayesian policy reuse
TLDR
The problem of policy reuse is formalised and an algorithm for efficiently responding to a novel task instance by reusing a policy from this library of existing policies, where the choice is based on observed ‘signals’ which correlate to policy performance is presented. Expand
Regret Bounds for Reinforcement Learning with Policy Advice
TLDR
It is proved that RLPA has a sub-linear regret of $\widetilde O(\sqrt{T})$ relative to the best input policy, and that both this regret and its computational complexity are independent of the size of the state and action space. Expand
Unsupervised Cross-Domain Transfer in Policy Gradient Reinforcement Learning via Manifold Alignment
TLDR
An autonomous framework is introduced that uses unsupervised manifold alignment to learn intertask mappings and effectively transfer samples between different task domains and its effectiveness for cross-domain transfer is demonstrated. Expand
Successor Features for Transfer in Reinforcement Learning
TLDR
This work proposes a transfer framework for the scenario where the reward function changes between tasks but the environment's dynamics remain the same, and derives two theorems that set the approach in firm theoretical ground and presents experiments that show that it successfully promotes transfer in practice. Expand
Learning with Options that Terminate Off-Policy
TLDR
A new algorithm is given, Q(\beta), that learns the solution with respect to any termination condition, regardless of how the options actually terminate, by casting learning with options into a common framework with well-studied multi-step off-policy learning. Expand
An automated measure of MDP similarity for transfer in reinforcement learning
TLDR
A data-driven automated similarity measure for Markov Decision Processes, based on the reconstruction error of a restricted Boltzmann machine that attempts to model the behavioral dynamics of the two MDPs being compared, which can be used to identify similar source tasks for transfer learning. Expand
Transfer learning with probabilistic mapping selection
TLDR
Experimental results show that the use of multiple inter-task mappings, accompanied with a probabilistic selection mechanism, can significantly boost the performance of transfer learning relative to 1) learning without transfer and 2) using a single hand-picked mapping. Expand
Transfer Reinforcement Learning with Shared Dynamics
TLDR
This article addresses a particular Transfer Reinforcement Learning problem: when dynamics do not change from one task to another, and only the reward function does, and relies on the optimism in the face of uncertainty principle and to use upper bound reward estimates. Expand
...
1
2
3
4
5
...