• Corpus ID: 62841559

# Fast Efficient Hyperparameter Tuning for Policy Gradients

@article{Paul2019FastEH,
title={Fast Efficient Hyperparameter Tuning for Policy Gradients},
author={Supratik Paul and Vitaly Kurin and Shimon Whiteson},
journal={ArXiv},
year={2019},
volume={abs/1902.06583}
}
• Published 18 February 2019
• Computer Science
• ArXiv
The performance of policy gradient methods is sensitive to hyperparameter settings that must be tuned for any new application. Widely used grid search methods for tuning hyperparameters are sample inefficient and computationally expensive. More advanced methods like Population Based Training that learn optimal schedules for hyperparameters instead of fixed settings can yield better results, but are also sample inefficient and computationally expensive. In this paper, we propose Hyperparameter…
18 Citations

## Figures and Tables from this paper

Online Hyper-parameter Tuning in Off-policy Learning via Evolutionary Strategies
• Computer Science
ArXiv
• 2020
This work proposes a framework which entails the application of Evolutionary Strategies to online hyper-parameter tuning in off-policy learning and shows that this method outperforms state-of-the-art off-Policy learning baselines with static hyper- parameters and recent prior work over a wide range of continuous control benchmarks.
A Self-Tuning Actor-Critic Algorithm
• Computer Science
NeurIPS
• 2020
This paper applies the algorithm, Self-Tuning Actor-Critic (STAC), to self-tune all the differentiable hyperparameters of an actor-critic loss function, to discover auxiliary tasks, and to improve off-policy learning using a novel leaky V-trace operator.
Tuning Mixed Input Hyperparameters on the Fly for Efficient Population Based AutoRL
• Computer Science
NeurIPS
• 2021
A new (provably) efficient hierarchical approach for optimizing both continuous and categorical variables, using a new time-varying bandit algorithm specifically designed for the population based training regime.
Provably Efficient Online Hyperparameter Optimization with Population-Based Bandits
• Computer Science
NeurIPS
• 2020
This work introduces the first provably efficient PBT-style algorithm, Population-Based Bandits (PB2), which uses a probabilistic model to guide the search in an efficient way, making it possible to discover high performing hyperparameter configurations with far fewer agents than typically required by PBT.
One-Shot Bayes Opt with Probabilistic Population Based Training
• Computer Science
ArXiv
• 2020
This work shows that Probabilistic Population-Based Training is able to achieve high performance with only a small population size, making it useful for all researchers regardless of their computational resources.
Towards Automatic Actor-Critic Solutions to Continuous Control
• Computer Science
ArXiv
• 2021
This paper creates an evolutionary approach that automatically tunes these design decisions and eliminates the RL-specific hyperparameters from the Soft ActorCritic algorithm, and shows that this agent outperforms well-tuned hyperparameter settings in popular benchmarks from the DeepMind Control Suite.
Optimizing hyperparameters of deep reinforcement learning for autonomous driving based on whale optimization algorithm
• Computer Science
PloS one
• 2021
A swarm-based optimization algorithm, namely the Whale Optimization Algorithm (WOA), is employed for optimizing the hyperparameters of the Deep Deterministic Policy Gradient (DDPG) algorithm to achieve the optimum control strategy in an autonomous driving control problem.
Automated Reinforcement Learning (AutoRL): A Survey and Open Problems
• Computer Science
J. Artif. Intell. Res.
• 2022
This survey seeks to unify the field of AutoRL, provide a common taxonomy, discuss each area in detail and pose open problems of interest to researchers going forward.
Towards accelerated robotic deployment by supervised learning of latent space observer and policy from simulated experiments with expert policies
• Computer Science
2020 IEEE/ASME International Conference on Advanced Intelligent Mechatronics (AIM)
• 2020
A novel sim2real architecture for converting simulated low level sensor data policies to high level real world policies and a proof of concept by simulating a simple low cost manipulator in pybullet to pick and place an object based on image observations is worked towards.
Two-Level Lattice Neural Network Architectures for Control of Nonlinear Systems
• Computer Science, Mathematics
2020 59th IEEE Conference on Decision and Control (CDC)
• 2020
This paper considers the problem of automatically designing a Rectified Linear Unit (ReLU) Neural Network (NN) architecture with the guarantee that it is sufficiently parametrized to control a nonlinear system and utilizes the authors’ recent results on the Two-Level Lattice NN architecture.

## References

SHOWING 1-10 OF 55 REFERENCES
Scalable Gradient-Based Tuning of Continuous Regularization Hyperparameters
• Computer Science
ICML
• 2016
The approach for tuning regularization hyperparameters is explored and it is found that in experiments on MNIST, SVHN and CIFAR-10, the resulting regularization levels are within the optimal regions.
Population Based Training of Neural Networks
• Computer Science
ArXiv
• 2017
Population Based Training is presented, a simple asynchronous optimisation algorithm which effectively utilises a fixed computational budget to jointly optimise a population of models and their hyperparameters to maximise performance.
This work proposes an algorithm for the optimization of continuous hyperparameters using inexact gradient information and gives sufficient conditions for the global convergence of this method, based on regularity conditions of the involved functions and summability of errors.
Where Did My Optimum Go?: An Empirical Analysis of Gradient Descent Optimization in Policy Gradient Methods
• Computer Science
ArXiv
• 2018
An empirical analysis of the effects that a wide range of gradient descent optimizers and their hyperparameters have on policy gradient methods, a subset of Deep RL algorithms, for benchmark continuous control tasks finds that adaptive optimizers have a narrow window of effective learning rates, diverging in other cases, and that the effectiveness of momentum varies depending on the properties of the environment.
Fast Bayesian Optimization of Machine Learning Hyperparameters on Large Datasets
• Computer Science
AISTATS
• 2017
A generative model for the validation error as a function of training set size is proposed, which learns during the optimization process and allows exploration of preliminary configurations on small subsets, by extrapolating to the full dataset.
Practical Bayesian Optimization of Machine Learning Algorithms
• Computer Science
NIPS
• 2012
This work describes new algorithms that take into account the variable cost of learning algorithm experiments and that can leverage the presence of multiple cores for parallel experimentation and shows that these proposed algorithms improve on previous automatic procedures and can reach or surpass human expert-level optimization for many algorithms.
• Computer Science
NIPS
• 2013
This paper proposes an adaptation of a recently developed acquisition function, entropy search, to the cost-sensitive, multi-task setting and demonstrates the utility of this new acquisition function by leveraging a small dataset to explore hyper-parameter settings for a large dataset.
A novel objective function for optimizing $\lambda$ as a function of state rather than time is contributed, which represents a concrete step towards black-box application of temporal-difference learning methods in real world problems.