• Corpus ID: 235683612

Tuning Mixed Input Hyperparameters on the Fly for Efficient Population Based AutoRL

@inproceedings{ParkerHolder2021TuningMI,
  title={Tuning Mixed Input Hyperparameters on the Fly for Efficient Population Based AutoRL},
  author={Jack Parker-Holder and Vu Nguyen and Shaan Desai and Stephen J. Roberts},
  booktitle={Neural Information Processing Systems},
  year={2021}
}
Despite a series of recent successes in reinforcement learning (RL), many RL algorithms remain sensitive to hyperparameters. As such, there has recently been interest in the field of AutoRL, which seeks to automate design decisions to create more general algorithms. Recent work suggests that population based approaches may be effective AutoRL algorithms, by learning hyperparameter schedules on the fly. In particular, the PB2 algorithm is able to achieve strong performance in RL tasks by… 

Bayesian Generational Population-Based Training

This paper introduces two new innovations in PBT-style methods that employ trust-region based Bayesian Optimization, enabling full coverage of the high-dimensional mixed hyperparameter search space and shows that using a generational approach, they can also learn both architectures and hyperparameters jointly on-the-y in a single training run.

B AYESIAN G ENERATIONAL P OPULATION -B ASED T RAINING

This paper introduces two new innovations in PBT-style methods that employ trust-region based Bayesian Optimization, enabling full coverage of the high-dimensional mixed hyperparameter search space and shows that using a generational approach, they can also learn both architectures and hyperparameters jointly on-the-fly in a single training run.

ARLO: A Framework for Automated Reinforcement Learning

This work proposes a general and exible framework, namely ARLO: Automated Reinforcement Learning Optimizer, to construct automated pipelines for AutoRL, and provides a Python implementation of such pipelines, released as an open-source library.

Automated Reinforcement Learning (AutoRL): A Survey and Open Problems

This survey seeks to unify the field of AutoRL, provide a common taxonomy, discuss each area in detail and pose open problems of interest to researchers going forward.

RL-DARTS: Differentiable Architecture Search for Reinforcement Learning

Throughout this training process, it is shown that the supernet gradually learns better cells, leading to alternative architectures which can be highly competitive against manually designed policies, but also verify previous design choices for RL policies.

Differentiable Architecture Search for Reinforcement Learning

It is discovered that the discrete architectures found can achieve up to 250% performance compared to manual architecture designs on both discrete and continuous action space environments across o-policy and on-policy RL algorithms, at only 3x more computation time.

ES-ENAS: Blackbox Optimization over Hybrid Spaces via Combinatorial and Continuous Evolution

ES-ENAS is proposed, a simple and modular joint optimization procedure combining the class of sample-ecient smoothed gradient gradient techniques with combinatorial optimizers in a highly scalable and intuitive way, inspired by the one-shot or supernet paradigm introduced in Ecient Neural Architecture Search.

Event-Triggered Time-Varying Bayesian Optimization

This work proposes an event-triggered algorithm, ET-GP-UCB, that detects changes in the objective function online and is competitive with state-of-the-art algorithms even though it requires no knowledge about the temporal changes.

References

SHOWING 1-10 OF 73 REFERENCES

Provably Efficient Online Hyperparameter Optimization with Population-Based Bandits

This work introduces the first provably efficient PBT-style algorithm, Population-Based Bandits (PB2), which uses a probabilistic model to guide the search in an efficient way, making it possible to discover high performing hyperparameter configurations with far fewer agents than typically required by PBT.

On the Importance of Hyperparameter Optimization for Model-based Reinforcement Learning

This work demonstrates that this problem can be tackled effectively with automated HPO, and shows that tuning of several MBRL hyperparameter dynamically, i.e. during the training itself, further improves the performance compared to using static hyperparameters which are kept static for the whole training.

Sample-Efficient Automated Deep Reinforcement Learning

A population-based automated RL (AutoRL) framework to meta-optimize arbitrary off-policy RL algorithms and optimize the hyperparameters and also the neural architecture while simultaneously training the agent by sharing the collected experience across the population to substantially increase the sample efficiency of the meta- Optimization.

A Self-Tuning Actor-Critic Algorithm

This paper applies the algorithm, Self-Tuning Actor-Critic (STAC), to self-tune all the differentiable hyperparameters of an actor-critic loss function, to discover auxiliary tasks, and to improve off-policy learning using a novel leaky V-trace operator.

Automatic Data Augmentation for Generalization in Deep Reinforcement Learning

This paper compares three approaches for automatically finding an appropriate augmentation and shows that their agent outperforms other baselines specifically designed to improve generalization in RL and learns policies and representations that are more robust to changes in the environment that do not affect the agent.

Fast Efficient Hyperparameter Tuning for Policy Gradients

This paper proposes Hyperparameter Optimisation on the Fly (HOOF), a gradient-free algorithm that requires no more than one training run to automatically adapt the hyperparameter that affect the policy update directly through the gradient.

A Generalized Framework for Population Based Training

This work proposes a general, black-box PBT framework that distributes many asynchronous "trials" (a small number of training steps with warm-starting) across a cluster, coordinated by the PBT controller, and shows that the system achieves better accuracy and faster convergence compared to existing methods, given the same computational resource.

Population Based Augmentation: Efficient Learning of Augmentation Policy Schedules

This paper introduces a new data augmentation algorithm, Population Based Augmentation (PBA), which generates nonstationary augmentation policy schedules instead of a fixed augmentationpolicy.

Improving Generalization in Reinforcement Learning with Mixture Regularization

This work introduces a simple approach, named mixreg, which trains agents on a mixture of observations from different training environments and imposes linearity constraints on the observation interpolations and the supervision and increases the data diversity more effectively and helps learn smoother policies.

Hyperband: A Novel Bandit-Based Approach to Hyperparameter Optimization

A novel algorithm is introduced, Hyperband, for hyperparameter optimization as a pure-exploration non-stochastic infinite-armed bandit problem where a predefined resource like iterations, data samples, or features is allocated to randomly sampled configurations.
...