• Corpus ID: 145049081

Collaborative Evolutionary Reinforcement Learning

@inproceedings{Khadka2019CollaborativeER,
  title={Collaborative Evolutionary Reinforcement Learning},
  author={Shauharda Khadka and Somdeb Majumdar and Tarek Nassar and Zach Dwiel and Evren Tumer and Santiago Miret and Yinyin Liu and Kagan Tumer},
  booktitle={ICML},
  year={2019}
}
Deep reinforcement learning algorithms have been successfully applied to a range of challenging control tasks. [] Key Method A collection of learners - typically proven algorithms like TD3 - optimize over varying time-horizons leading to this diverse portfolio. All learners contribute to and use a shared replay buffer to achieve greater sample efficiency. Computational resources are dynamically distributed to favor the best learners as a form of online algorithm selection.

Figures and Tables from this paper

Proximal Distilled Evolutionary Reinforcement Learning
TLDR
A novel algorithm called Proximal Distilled Evolutionary Reinforcement Learning (PDERL) that is characterised by a hierarchical integration between evolution and learning that outperforms ERL, as well as two state-of-the-art RL algorithms, PPO and TD3, in all tested environments.
Coevolutionary Deep Reinforcement Learning
TLDR
It is demonstrated that competitive pressures can be utilised to improve self-play and the algorithm leverages coevolution, an evolutionary inspired process in which individuals are compelled to innovate and adapt, to optimise the training of a population of reinforcement learning agents.
Competitive and Cooperative Heterogeneous Deep Reinforcement Learning
TLDR
This work presents a competitive and cooperative heterogeneous deep reinforcement learning framework called C2HRL, which aims to learn a superior agent that exceeds the capabilities of the individual agent in an agent pool through two agent management mechanisms.
Evolutionary Action Selection for Gradient-based Policy Learning
TLDR
This paper proposes Evolutionary Action Selection-Twin Delayed Deep Deterministic Policy Gradient (EAS-TD3), a novel combination of EA and DRL that focuses on optimizing the action chosen by the policy network and attempt to obtain high-quality actions to guide policy learning through an evolutionary algorithm.
Genetic Soft Updates for Policy Evolution in Deep Reinforcement Learning
TLDR
A novel mixed framework that exploits a periodical genetic evaluation to soft update the weights of a DRL agent and employs formal verification to confirm the policy improvement, mitigating the inefficient exploration and hyper-parameter sensitivity of DRL.
A History-based Framework for Online Continuous Action Ensembles in Deep Reinforcement Learning
This work seeks optimized techniques of action ensemble deep reinforcement learning to decrease the hyperparameter tuning effort as well as improve performance and robustness, while avoiding parallel
Sample-Efficient Automated Deep Reinforcement Learning
TLDR
A population-based automated RL (AutoRL) framework to meta-optimize arbitrary off-policy RL algorithms and optimize the hyperparameters and also the neural architecture while simultaneously training the agent by sharing the collected experience across the population to substantially increase the sample efficiency of the meta- Optimization.
Effective Diversity in Population-Based Reinforcement Learning
TLDR
This paper introduces both evolutionary and gradient-based instantiations of DvD and shows they effectively improve exploration without reducing performance when better exploration is not required, and adapts the degree of diversity during training using online learning techniques.
Coordinated Exploration via Intrinsic Rewards for Multi-Agent Reinforcement Learning
TLDR
It is argued that exploration in cooperative multi-agent settings can be accelerated and improved if agents coordinate with respect to the regions of the state space they explore if the agents can coordinate their exploration and maximize extrinsic returns.
...
...

References

SHOWING 1-10 OF 49 REFERENCES
Evolution-Guided Policy Gradient in Reinforcement Learning
TLDR
Evolutionary Reinforcement Learning (ERL), a hybrid algorithm that leverages the population of an EA to provide diversified data to train an RL agent, and reinserts the RL agent into theEA population periodically to inject gradient information into the EA.
Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor
TLDR
This paper proposes soft actor-critic, an off-policy actor-Critic deep RL algorithm based on the maximum entropy reinforcement learning framework, and achieves state-of-the-art performance on a range of continuous control benchmark tasks, outperforming prior on-policy and off- policy methods.
Evolution Strategies as a Scalable Alternative to Reinforcement Learning
TLDR
This work explores the use of Evolution Strategies (ES), a class of black box optimization algorithms, as an alternative to popular MDP-based RL techniques such as Q-learning and Policy Gradients, and highlights several advantages of ES as a blackbox optimization technique.
Benchmarking Deep Reinforcement Learning for Continuous Control
TLDR
This work presents a benchmark suite of continuous control tasks, including classic tasks like cart-pole swing-up, tasks with very high state and action dimensionality such as 3D humanoid locomotion, task with partial observations, and tasks with hierarchical structure.
GEP-PG: Decoupling Exploration and Exploitation in Deep Reinforcement Learning Algorithms
TLDR
This paper presents the GEP-PG approach, taking the best of both worlds by sequentially combining a Goal Exploration Process and two variants of DDPG on a low dimensional deceptive reward problem and on the larger Half-Cheetah benchmark.
ES is more than just a traditional finite-difference approximator
TLDR
This work highlights differences in ES that can channel ES into distinct areas of the search space relative to gradient descent, and also consequently to networks with distinct properties, and its consequences for optimization.
Reproducibility of Benchmarked Deep Reinforcement Learning Tasks for Continuous Control
TLDR
The significance of hyper-parameters in policy gradients for continuous control, general variance in the algorithms, and reproducibility of reported results are investigated and the guidelines on reporting novel results as comparisons against baseline methods are provided.
Population Based Training of Neural Networks
TLDR
Population Based Training is presented, a simple asynchronous optimisation algorithm which effectively utilises a fixed computational budget to jointly optimise a population of models and their hyperparameters to maximise performance.
Parameter Space Noise for Exploration
TLDR
This work demonstrates that RL with parameter noise learns more efficiently than traditional RL with action space noise and evolutionary strategies individually through experimental comparison of DQN, DDPG, and TRPO on high-dimensional discrete action environments as well as continuous control tasks.
VIME: Variational Information Maximizing Exploration
TLDR
VIME is introduced, an exploration strategy based on maximization of information gain about the agent's belief of environment dynamics which efficiently handles continuous state and action spaces and can be applied with several different underlying RL algorithms.
...
...