Corpus ID: 221470196

Sample-Efficient Automated Deep Reinforcement Learning

@article{Franke2021SampleEfficientAD,
  title={Sample-Efficient Automated Deep Reinforcement Learning},
  author={J. Franke and Gregor Koehler and Andr{\'e} Biedenkapp and F. Hutter},
  journal={ArXiv},
  year={2021},
  volume={abs/2009.01555}
}
Despite significant progress in challenging problems across various domains, applying state-of-the-art deep reinforcement learning (RL) algorithms remains challenging due to their sensitivity to the choice of hyperparameters. This sensitivity can partly be attributed to the non-stationarity of the RL problem, potentially requiring different hyperparameter settings at various stages of the learning process. Additionally, in the RL setting, hyperparameter optimization (HPO) requires a large… Expand

Figures and Tables from this paper

Evolving Reinforcement Learning Algorithms
TLDR
A method for meta-learning reinforcement learning algorithms by searching over the space of computational graphs which compute the loss function for a value-based model-free RL agent to optimize, which shows resemblance to recently proposed RL algorithms that address overestimation in value- based methods. Expand
RL-DARTS: Differentiable Architecture Search for Reinforcement Learning
TLDR
Throughout this training process, it is shown that the supernet gradually learns better cells, leading to alternative architectures which can be highly competitive against manually designed policies, but also verify previous design choices for RL policies. Expand
Towards Automatic Actor-Critic Solutions to Continuous Control
TLDR
This paper creates an evolutionary approach that automatically tunes these design decisions and eliminates the RL-specific hyperparameters from the Soft Actor-Critic algorithm, and then applies it to new control tasks to find high-performance solutions with minimal compute and research effort. Expand
Tuning Mixed Input Hyperparameters on the Fly for Efficient Population Based AutoRL
TLDR
A new (provably) efficient hierarchical approach for optimizing both continuous and categorical variables, using a new time-varying bandit algorithm specifically designed for the population based training regime. Expand
Towards robust and domain agnostic reinforcement learning competitions
TLDR
A new framework of competition design is presented that promotes the development of algorithms that overcome barriers to entry and proposes four central mechanisms for achieving this end: submission retraining, domain randomization, desemantization through domain obfuscation, and the © W.H. Guss et al. Expand

References

SHOWING 1-10 OF 46 REFERENCES
Evolution-Guided Policy Gradient in Reinforcement Learning
TLDR
Evolutionary Reinforcement Learning (ERL), a hybrid algorithm that leverages the population of an EA to provide diversified data to train an RL agent, and reinserts the RL agent into theEA population periodically to inject gradient information into the EA. Expand
IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures
TLDR
A new distributed agent IMPALA (Importance Weighted Actor-Learner Architecture) is developed that not only uses resources more efficiently in single-machine training but also scales to thousands of machines without sacrificing data efficiency or resource utilisation. Expand
Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor
TLDR
This paper proposes soft actor-critic, an off-policy actor-Critic deep RL algorithm based on the maximum entropy reinforcement learning framework, and achieves state-of-the-art performance on a range of continuous control benchmark tasks, outperforming prior on-policy and off- policy methods. Expand
Collaborative Evolutionary Reinforcement Learning
TLDR
Collaborative Evolutionary Reinforcement Learning (CERL) is introduced, a scalable framework that comprises a portfolio of policies that simultaneously explore and exploit diverse regions of the solution space and significantly outperforms its composite learners while remaining overall more sample-efficient. Expand
Deep Reinforcement Learning that Matters
TLDR
Challenges posed by reproducibility, proper experimental techniques, and reporting procedures are investigated and guidelines to make future results in deep RL more reproducible are suggested. Expand
Population Based Training of Neural Networks
TLDR
Population Based Training is presented, a simple asynchronous optimisation algorithm which effectively utilises a fixed computational budget to jointly optimise a population of models and their hyperparameters to maximise performance. Expand
How to Discount Deep Reinforcement Learning: Towards New Dynamic Strategies
TLDR
When the discount factor progressively increases up to its final value, it is empirically shown that it is possible to significantly reduce the number of learning steps and the possibility to fall within a local optimum during the learning process, thus connecting the discussion with the exploration/exploitation dilemma. Expand
Reproducibility of Benchmarked Deep Reinforcement Learning Tasks for Continuous Control
TLDR
The significance of hyper-parameters in policy gradients for continuous control, general variance in the algorithms, and reproducibility of reported results are investigated and the guidelines on reporting novel results as comparisons against baseline methods are provided. Expand
Proximal Distilled Evolutionary Reinforcement Learning
TLDR
A novel algorithm called Proximal Distilled Evolutionary Reinforcement Learning (PDERL) that is characterised by a hierarchical integration between evolution and learning that outperforms ERL, as well as two state-of-the-art RL algorithms, PPO and TD3, in all tested environments. Expand
Continuous control with deep reinforcement learning
TLDR
This work presents an actor-critic, model-free algorithm based on the deterministic policy gradient that can operate over continuous action spaces, and demonstrates that for many of the tasks the algorithm can learn policies end-to-end: directly from raw pixel inputs. Expand
...
1
2
3
4
5
...