• Corpus ID: 3545204

Stochastic Hyperparameter Optimization through Hypernetworks

@article{Lorraine2018StochasticHO,
  title={Stochastic Hyperparameter Optimization through Hypernetworks},
  author={Jonathan Lorraine and David Kristjanson Duvenaud},
  journal={ArXiv},
  year={2018},
  volume={abs/1802.09419}
}
Machine learning models are usually tuned by nesting optimization of model weights inside the optimization of hyperparameters. We give a method to collapse this nested optimization into joint stochastic optimization of both weights and hyperparameters. Our method trains a neural network to output approximately optimal weights as a function of hyperparameters. We show that our method converges to locally optimal weights and hyperparameters for sufficiently large hypernets. We compare this method… 

Figures from this paper

Self-Tuning Networks: Bilevel Optimization of Hyperparameters using Structured Best-Response Functions
TLDR
This work aims to adapt regularization hyperparameters for neural networks by fitting compact approximations to the best-response function, which mapshyperparameters to optimal weights and biases, and outperforms competing hyperparameter optimization methods on large-scale deep learning problems.
Online hyperparameter optimization by real-time recurrent learning
TLDR
This framework takes advantage of the analogy between hyperparameter optimization and parameter learning in recurrent neural networks (RNNs) and adapts a well-studied family of online learning algorithms for RNNs to tune hyperparameters and network parameters simultaneously, without repeatedly rolling out iterative optimization.
Optimizing Millions of Hyperparameters by Implicit Differentiation
TLDR
An algorithm for inexpensive gradient-based hyperparameter optimization that combines the implicit function theorem (IFT) with efficient inverse Hessian approximations is proposed and used to train modern network architectures with millions of weights and millions of hyper-parameters.
ING STRUCTURED BEST-RESPONSE FUNCTIONS
TLDR
This work aims to adapt regularization hyperparameters for neural networks by fitting compact approximations to the best-response function, which mapshyperparameters to optimal weights and biases, and outperforms competing hyperparameter optimization methods on large-scale deep learning problems.
An Empirical Study of Neural Network Hyperparameters
TLDR
This study involves an overview of some of the commonly used hyperparameters in the context of learning algorithms used for training neural networks along with the analysis of adaptive learning algorithm used for tuning learning rates.
Principled Weight Initialization for Hypernetworks
Hypernetworks are meta neural networks that generate weights for a main neural network in an end-to-end differentiable manner. Despite extensive applications ranging from multi-task learning to
Learning to Mutate with Hypergradient Guided Population
TLDR
A hyperparameter mutation (HPM) algorithm is proposed to explicitly consider a learnable trade-off between using global and local search, where a population of student models are adopted to simultaneously explore thehyperparameter space guided by hypergradient and leverage a teacher model to mutate the underperforming students by exploiting the top ones.
Scalable One-Pass Optimisation of High-Dimensional Weight-Update Hyperparameters by Implicit Differentiation
TLDR
This work extends existing methods to develop an approximate hypergradient-based hyperparameter optimiser which is applicable to any continuoushyperparameter appearing in a differentiable model weight update, yet requires only one training episode, with no restarts.
Generating Neural Networks with Neural Networks
TLDR
This work formulation the hypernetwork training objective as a compromise between accuracy and diversity, where the diversity takes into account trivial symmetry transformations of the target network, and explains how it is related to variational inference.
A Gradient-based Bilevel Optimization Approach for Tuning Hyperparameters in Machine Learning
TLDR
This paper proposes a bilevel solution method for solving the hyperparameter optimization problem that does not suffer from the drawbacks of the earlier studies and performs extensive computational study on two datasets that confirm the efficiency of the proposed method.
...
...

References

SHOWING 1-10 OF 30 REFERENCES
Gradient-based Hyperparameter Optimization through Reversible Learning
TLDR
This work computes exact gradients of cross-validation performance with respect to all hyperparameters by chaining derivatives backwards through the entire training procedure, which allows us to optimize thousands ofhyperparameters, including step-size and momentum schedules, weight initialization distributions, richly parameterized regularization schemes, and neural network architectures.
Hyperparameter optimization with approximate gradient
TLDR
This work proposes an algorithm for the optimization of continuous hyperparameters using inexact gradient information and gives sufficient conditions for the global convergence of this method, based on regularity conditions of the involved functions and summability of errors.
Scalable Gradient-Based Tuning of Continuous Regularization Hyperparameters
TLDR
The approach for tuning regularization hyperparameters is explored and it is found that in experiments on MNIST, SVHN and CIFAR-10, the resulting regularization levels are within the optimal regions.
Hyperband: A Novel Bandit-Based Approach to Hyperparameter Optimization
TLDR
A novel algorithm is introduced, Hyperband, for hyperparameter optimization as a pure-exploration non-stochastic infinite-armed bandit problem where a predefined resource like iterations, data samples, or features is allocated to randomly sampled configurations.
Efficient Hyperparameter Optimization and Infinitely Many Armed Bandits
TLDR
This work introduces Hyperband for hyperparameter optimization as a pure-exploration non-stochastic infinitely many armed bandit problem where allocation of additional resources to an arm corresponds to training a configuration on larger subsets of the data.
Practical Bayesian Optimization of Machine Learning Algorithms
TLDR
This work describes new algorithms that take into account the variable cost of learning algorithm experiments and that can leverage the presence of multiple cores for parallel experimentation and shows that these proposed algorithms improve on previous automatic procedures and can reach or surpass human expert-level optimization for many algorithms.
Forward and Reverse Gradient-Based Hyperparameter Optimization
We study two procedures (reverse-mode and forward-mode) for computing the gradient of the validation error with respect to the hyperparameters of any iterative learning algorithm such as stochastic
HyperNetworks
This work explores hypernetworks: an approach of using one network, also known as a hypernetwork, to generate the weights for another network. We apply hypernetworks to generate adaptive weights for
DrMAD: Distilling Reverse-Mode Automatic Differentiation for Optimizing Hyperparameters of Deep Neural Networks
TLDR
This work proposes a simple but effective method, DrMAD, to distill the knowledge of the forward pass into a shortcut path, through which the authors approximately reverse the training trajectory.
SMASH: One-Shot Model Architecture Search through HyperNetworks
TLDR
A technique to accelerate architecture selection by learning an auxiliary HyperNet that generates the weights of a main model conditioned on that model's architecture is proposed, achieving competitive performance with similarly-sized hand-designed networks.
...
...