Fast Population-Based Reinforcement Learning on a Single Machine

  title={Fast Population-Based Reinforcement Learning on a Single Machine},
  author={Arthur Flajolet and Claire Bizon Monroc and Karim Beguir and Thomas Pierrot},
Training populations of agents has demonstrated great promise in Reinforcement Learning for stabilizing training, improving exploration and asymptotic performance, and generating a diverse set of solutions. However, population-based training is often not considered by practitioners as it is perceived to be either prohibitively slow (when implemented sequentially), or computationally expensive (if agents are trained in parallel on independent accelerators). In this work, we compare… 

Figures and Tables from this paper



Accelerated Methods for Deep Reinforcement Learning

This work investigates how to optimize existing deep RL algorithms for modern computers, specifically for a combination of CPUs and GPUs, and confirms that both policy gradient and Q-value learning algorithms can be adapted to learn using many parallel simulator instances.

Population Based Training of Neural Networks

Population Based Training is presented, a simple asynchronous optimisation algorithm which effectively utilises a fixed computational budget to jointly optimise a population of models and their hyperparameters to maximise performance.

Effective Diversity in Population-Based Reinforcement Learning

This paper introduces both evolutionary and gradient-based instantiations of DvD and shows they effectively improve exploration without reducing performance when better exploration is not required, and adapts the degree of diversity during training using online learning techniques.

IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures

A new distributed agent IMPALA (Importance Weighted Actor-Learner Architecture) is developed that not only uses resources more efficiently in single-machine training but also scales to thousands of machines without sacrificing data efficiency or resource utilisation.

Soft Actor-Critic Algorithms and Applications

Soft Actor-Critic (SAC), the recently introduced off-policy actor-critic algorithm based on the maximum entropy RL framework, achieves state-of-the-art performance, outperforming prior on-policy and off- policy methods in sample-efficiency and asymptotic performance.

Scaling MAP-Elites to deep neuroevolution

A new hybrid algorithm called MAP-Elites with Evolution Strategies (ME-ES) is designed and evaluated for post-damage recovery in a difficult high-dimensional control task where traditional ME fails, and it is shown that ME-ES performs efficient exploration, on par with state-of-the-art exploration algorithms in high- dimensional control tasks with strongly deceptive rewards.

Sample Factory: Egocentric 3D Control from Pixels at 100000 FPS with Asynchronous Reinforcement Learning

The "Sample Factory" is presented, a high-throughput training system optimized for a single-machine setting that combines a highly efficient, asynchronous, GPU-based sampler with off-policy correction techniques, allowing for throughput higher than $10^5$ environment frames/second on non-trivial control problems in 3D without sacrificing sample efficiency.

Population-Guided Parallel Policy Search for Reinforcement Learning

A new population-guided parallel learning scheme is proposed to enhance the performance of off-policy reinforcement learning (RL) by constructing an augmented loss function for policy update to enlarge the overall search region by the multiple learners.

Deep Reinforcement Learning at the Edge of the Statistical Precipice

This paper argues that reliable evaluation in the few-run deep RL regime cannot ignore the uncertainty in results without running the risk of slowing down progress in the field, and advocates for reporting interval estimates of aggregate performance and proposing performance profiles to account for the variability in results.

Policy gradient assisted MAP-Elites

PGA-MAP-Elites is presented, a novel algorithm that enables MAP-Elite to efficiently evolve large neural network controllers by introducing a gradient-based variation operator inspired by Deep Reinforcement Learning.