• Corpus ID: 62841559

Fast Efficient Hyperparameter Tuning for Policy Gradients

@article{Paul2019FastEH,
  title={Fast Efficient Hyperparameter Tuning for Policy Gradients},
  author={Supratik Paul and Vitaly Kurin and Shimon Whiteson},
  journal={ArXiv},
  year={2019},
  volume={abs/1902.06583}
}
The performance of policy gradient methods is sensitive to hyperparameter settings that must be tuned for any new application. Widely used grid search methods for tuning hyperparameters are sample inefficient and computationally expensive. More advanced methods like Population Based Training that learn optimal schedules for hyperparameters instead of fixed settings can yield better results, but are also sample inefficient and computationally expensive. In this paper, we propose Hyperparameter… 

Figures and Tables from this paper

Online Hyper-parameter Tuning in Off-policy Learning via Evolutionary Strategies
TLDR
This work proposes a framework which entails the application of Evolutionary Strategies to online hyper-parameter tuning in off-policy learning and shows that this method outperforms state-of-the-art off-Policy learning baselines with static hyper- parameters and recent prior work over a wide range of continuous control benchmarks.
A Self-Tuning Actor-Critic Algorithm
TLDR
This paper applies the algorithm, Self-Tuning Actor-Critic (STAC), to self-tune all the differentiable hyperparameters of an actor-critic loss function, to discover auxiliary tasks, and to improve off-policy learning using a novel leaky V-trace operator.
Tuning Mixed Input Hyperparameters on the Fly for Efficient Population Based AutoRL
TLDR
A new (provably) efficient hierarchical approach for optimizing both continuous and categorical variables, using a new time-varying bandit algorithm specifically designed for the population based training regime.
Provably Efficient Online Hyperparameter Optimization with Population-Based Bandits
TLDR
This work introduces the first provably efficient PBT-style algorithm, Population-Based Bandits (PB2), which uses a probabilistic model to guide the search in an efficient way, making it possible to discover high performing hyperparameter configurations with far fewer agents than typically required by PBT.
One-Shot Bayes Opt with Probabilistic Population Based Training
TLDR
This work shows that Probabilistic Population-Based Training is able to achieve high performance with only a small population size, making it useful for all researchers regardless of their computational resources.
Towards Automatic Actor-Critic Solutions to Continuous Control
TLDR
This paper creates an evolutionary approach that automatically tunes these design decisions and eliminates the RL-specific hyperparameters from the Soft ActorCritic algorithm, and shows that this agent outperforms well-tuned hyperparameter settings in popular benchmarks from the DeepMind Control Suite.
Optimizing hyperparameters of deep reinforcement learning for autonomous driving based on whale optimization algorithm
TLDR
A swarm-based optimization algorithm, namely the Whale Optimization Algorithm (WOA), is employed for optimizing the hyperparameters of the Deep Deterministic Policy Gradient (DDPG) algorithm to achieve the optimum control strategy in an autonomous driving control problem.
Automated Reinforcement Learning (AutoRL): A Survey and Open Problems
TLDR
This survey seeks to unify the field of AutoRL, provide a common taxonomy, discuss each area in detail and pose open problems of interest to researchers going forward.
Towards accelerated robotic deployment by supervised learning of latent space observer and policy from simulated experiments with expert policies
TLDR
A novel sim2real architecture for converting simulated low level sensor data policies to high level real world policies and a proof of concept by simulating a simple low cost manipulator in pybullet to pick and place an object based on image observations is worked towards.
Two-Level Lattice Neural Network Architectures for Control of Nonlinear Systems
TLDR
This paper considers the problem of automatically designing a Rectified Linear Unit (ReLU) Neural Network (NN) architecture with the guarantee that it is sufficiently parametrized to control a nonlinear system and utilizes the authors’ recent results on the Two-Level Lattice NN architecture.
...
...

References

SHOWING 1-10 OF 55 REFERENCES
Scalable Gradient-Based Tuning of Continuous Regularization Hyperparameters
TLDR
The approach for tuning regularization hyperparameters is explored and it is found that in experiments on MNIST, SVHN and CIFAR-10, the resulting regularization levels are within the optimal regions.
Population Based Training of Neural Networks
TLDR
Population Based Training is presented, a simple asynchronous optimisation algorithm which effectively utilises a fixed computational budget to jointly optimise a population of models and their hyperparameters to maximise performance.
Hyperparameter optimization with approximate gradient
TLDR
This work proposes an algorithm for the optimization of continuous hyperparameters using inexact gradient information and gives sufficient conditions for the global convergence of this method, based on regularity conditions of the involved functions and summability of errors.
Where Did My Optimum Go?: An Empirical Analysis of Gradient Descent Optimization in Policy Gradient Methods
TLDR
An empirical analysis of the effects that a wide range of gradient descent optimizers and their hyperparameters have on policy gradient methods, a subset of Deep RL algorithms, for benchmark continuous control tasks finds that adaptive optimizers have a narrow window of effective learning rates, diverging in other cases, and that the effectiveness of momentum varies depending on the properties of the environment.
Fast Bayesian Optimization of Machine Learning Hyperparameters on Large Datasets
TLDR
A generative model for the validation error as a function of training set size is proposed, which learns during the optimization process and allows exploration of preliminary configurations on small subsets, by extrapolating to the full dataset.
Practical Bayesian Optimization of Machine Learning Algorithms
TLDR
This work describes new algorithms that take into account the variable cost of learning algorithm experiments and that can leverage the presence of multiple cores for parallel experimentation and shows that these proposed algorithms improve on previous automatic procedures and can reach or surpass human expert-level optimization for many algorithms.
Multi-Task Bayesian Optimization
TLDR
This paper proposes an adaptation of a recently developed acquisition function, entropy search, to the cost-sensitive, multi-task setting and demonstrates the utility of this new acquisition function by leveraging a small dataset to explore hyper-parameter settings for a large dataset.
Gradient-Based Optimization of Hyperparameters
TLDR
This article presents a methodology to optimize several hyper-parameters, based on the computation of the gradient of a model selection criterion with respect to the hyperparameter gradient involving second derivatives of the training criterion.
A Greedy Approach to Adapting the Trace Parameter for Temporal Difference Learning
TLDR
A novel objective function for optimizing $\lambda$ as a function of state rather than time is contributed, which represents a concrete step towards black-box application of temporal-difference learning methods in real world problems.
Policy Gradient Methods for Reinforcement Learning with Function Approximation
TLDR
This paper proves for the first time that a version of policy iteration with arbitrary differentiable function approximation is convergent to a locally optimal policy.
...
...