• Corpus ID: 235742951

AdaRL: What, Where, and How to Adapt in Transfer Reinforcement Learning

@article{Huang2021AdaRLWW,
  title={AdaRL: What, Where, and How to Adapt in Transfer Reinforcement Learning},
  author={Biwei Huang and Fan Feng and Chaochao Lu and Sara Magliacane and Kun Zhang},
  journal={ArXiv},
  year={2021},
  volume={abs/2107.02729}
}
Most approaches in reinforcement learning (RL) are data-hungry and specific to fixed environments. In this paper, we propose a principled framework for adaptive RL, called AdaRL, that adapts reliably to changes across domains. Specifically, we construct a generative environment model for the structural relationships among variables in the system and embed the changes in a compact way, which provides a clear and interpretable picture for locating what and where the changes are and how to adapt… 
Factored Adaptation for Non-Stationary Reinforcement Learning
TLDR
Experimental results demonstrate that FANS-RL outperforms existing approaches in terms of rewards, compactness of the latent state representation and robustness to varying degrees of non-stationarity.
A Survey of Generalisation in Deep Reinforcement Learning
TLDR
It is argued that taking a purely procedural content generation approach to benchmark design is not conducive to progress in generalisation, and fast online adaptation and tackling RL-specific problems as some areas for future work on methods for generalisation are suggested.
A Relational Intervention Approach for Unsupervised Dynamics Generalization in Model-Based Reinforcement Learning
TLDR
It is empirically show that Ẑ estimated by this method can significantly reduce dynamics prediction errors and improve the performance of model-based RL methods on zero-shot new environments with unseen dynamics.
I NVARIANT C AUSAL R EPRESENTATION L EARNING FOR G ENERALIZATION IN I MITATION AND R EINFORCEMENT L EARNING
TLDR
A fundamental challenge in imitation and reinforcement learning is to learn policies, representations, or dynamics that do not build on spurious correlations and generalize beyond the specific environments that they were trained on by leveraging a diverse set of training environments.
Learning Mixtures of Linear Dynamical Systems
TLDR
A two-stage meta-algorithm is developed, guaranteed to recover each ground-truth LDS model up to error (cid:101) O ( ( cid:112) d/T ) , where T is the total sample size.
Learning Latent Causal Dynamics
TLDR
This work proposes a principled framework, called LiLY, to recover time-delayed latent causal variables and identify their relations from measured temporal data under different distribution shifts, and establishes the identifiability theories of nonparametric latent causal dynamics from their nonlinear mixtures under fixed dynamics and under changes.

References

SHOWING 1-10 OF 83 REFERENCES
Learning to Adapt in Dynamic, Real-World Environments through Meta-Reinforcement Learning
TLDR
This work uses meta-learning to train a dynamics model prior such that, when combined with recent data, this prior can be rapidly adapted to the local context and demonstrates the importance of incorporating online adaptation into autonomous agents that operate in the real world.
Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks
We propose an algorithm for meta-learning that is model-agnostic, in the sense that it is compatible with any model trained with gradient descent and applicable to a variety of different learning
IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures
TLDR
A new distributed agent IMPALA (Importance Weighted Actor-Learner Architecture) is developed that not only uses resources more efficiently in single-machine training but also scales to thousands of machines without sacrificing data efficiency or resource utilisation.
Fast Context Adaptation via Meta-Learning
TLDR
It is shown empirically that CAVIA outperforms MAML on regression, classification, and reinforcement learning problems and is easier to implement, and is more robust to the inner-loop learning rate.
Distral: Robust multitask reinforcement learning
TLDR
This work proposes a new approach for joint training of multiple tasks, which it refers to as Distral (Distill & transfer learning), and shows that the proposed learning process is more robust and more stable---attributes that are critical in deep reinforcement learning.
Guided Meta-Policy Search
TLDR
This paper proposes to learn a reinforcement learning procedure through imitation of expert policies that solve previously-seen tasks, and demonstrates significant improvements in meta-RL sample efficiency in comparison to prior work as well as the ability to scale to domains with visual observations.
Transfer Learning for Related Reinforcement Learning Tasks via Image-to-Image Translation
TLDR
It is shown that by separating the visual transfer task from the control policy the authors achieve substantially better sample efficiency and transfer behavior, allowing an agent trained on the source task to transfer well to the target tasks.
Deep Q-learning From Demonstrations
TLDR
This paper presents an algorithm, Deep Q-learning from Demonstrations (DQfD), that leverages small sets of demonstration data to massively accelerate the learning process even from relatively small amounts of demonstrating data and is able to automatically assess the necessary ratio of demonstrationData while learning thanks to a prioritized replay mechanism.
Model-Invariant State Abstractions for Model-Based Reinforcement Learning
TLDR
This paper introduces a new type of state abstraction called model-invariance, which allows for generalization to novel combinations of unseen values of state variables, something that non-factored forms of state abstractions cannot do.
Invariant Causal Prediction for Block MDPs
TLDR
This paper uses tools from causal inference to propose a method of invariant prediction to learn model-irrelevance state abstractions (MISA) that generalize to novel observations in the multi-environment setting, and proves that for certain classes of environments, this approach outputs with high probability a state abstraction corresponding to the causal feature set with respect to the return.
...
...