Model-Free Generative Replay for Lifelong Reinforcement Learning: Application to Starcraft-2

  title={Model-Free Generative Replay for Lifelong Reinforcement Learning: Application to Starcraft-2},
  author={Zachary A. Daniels and Aswin Raghavan and Jesse Hostetler and Abrar Rahman and Indranil Sur and Michael R. Piacentino and Ajay Divakaran},
One approach to meet the challenges of deep lifelong reinforcement learning (LRL) is careful man-agement of the agent’s learning experiences, in order to learn (without forgetting) and build internal meta-models (of the tasks, environments, agents, and world). Generative replay (GR) is a biologically-inspired replay mechanism that augments learning experiences with self-labelled examples drawn from an internal generative model that is updated over time. In this paper, we present a version of GR… 

Towards Continual Reinforcement Learning: A Review and Perspectives

A taxonomy of different continual RL formulations and mathematically characterize the non-stationary dynamics of each setting is provided, providing an overview of benchmarks used in the literature and important metrics for understanding agent performance.

Quantum Multi-Agent Meta Reinforcement Learning

This article re-designs multi-agent reinforcement learn- ing (MARL) based on the unique characteristics of quantum neural networks (QNNs) having two separate dimensions of trainable parameters: angle parameters affecting the output qubit states, and pole parameters associated with the out- put measurement basis.



Generative replay with feedback connections as a general strategy for continual learning

This work reduced the computational cost of generative replay by integrating the generative model into the main model by equipping it with generative feedback or backward connections and believes this to be an important first step towards making the powerful technique ofGenerative replay scalable to real-world continual learning applications.

Continual Learning Using World Models for Pseudo-Rehearsal

This work proposes a method to continually learn these internal world models through the interleaving of internally generated episodes of past experiences (i.e., pseudo-rehearsal), and shows that modern policy gradient based reinforcement learning algorithms can use this internal model to continual learn to optimize reward based on the world model's representation of the environment.

Using World Models for Pseudo-Rehearsal in Continual Learning

This work proposes a method to continually learn internal world models through the interleaving of internally generated rollouts from past experiences, and shows this method can sequentially learn unsupervised temporal prediction, without task labels, in a disparate set of Atari games.

Selective Experience Replay for Lifelong Learning

Overall, the results show that selective experience replay, when suitable selection algorithms are employed, can prevent catastrophic forgetting and is consistently the best approach on all domains tested.

IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures

A new distributed agent IMPALA (Importance Weighted Actor-Learner Architecture) is developed that not only uses resources more efficiently in single-machine training but also scales to thousands of machines without sacrificing data efficiency or resource utilisation.

Lifelong Learning using Eigentasks: Task Separation, Skill Acquisition, and Selective Transfer

Improved performance over the state-of-the-art in supervised continual learning, and evidence of forward knowledge transfer in a lifelong RL application in the game Starcraft2 is shown.

Experience Replay for Continual Learning

This work shows that using experience replay buffers for all past events with a mixture of on- and off-policy learning can still learn new tasks quickly yet can substantially reduce catastrophic forgetting in both Atari and DMLab domains, even matching the performance of methods that require task identities.

Continuous Coordination As a Realistic Scenario for Lifelong Learning

This work introduces a multi-agent lifelong learning testbed that supports both zero-shot and few-shot settings, and empirically shows that the agents trained in this setup are able to coordinate well with unseen agents, without any additional assumptions made by previous works.

Lifelong Policy Gradient Learning of Factored Policies for Faster Training Without Forgetting

This work provides a novel method for lifelong policy gradient learning that trains lifelong function approximators directly via policy gradients, allowing the agent to benefit from accumulated knowledge throughout the entire training process.