• Corpus ID: 632197

Practical Bayesian Optimization of Machine Learning Algorithms

  title={Practical Bayesian Optimization of Machine Learning Algorithms},
  author={Jasper Snoek and H. Larochelle and Ryan P. Adams},
The use of machine learning algorithms frequently involves careful tuning of learning parameters and model hyperparameters. [] Key Method We describe new algorithms that take into account the variable cost (duration) of learning algorithm experiments and that can leverage the presence of multiple cores for parallel experimentation. We show that these proposed algorithms improve on previous automatic procedures and can reach or surpass human expert-level optimization for many algorithms including latent…

Figures from this paper

An Empirical Bayes Approach to Optimizing Machine Learning Algorithms
The resulting approach, empirical Bayes for hyperparameter averaging (EB-Hyp) predicts held-out data better than Bayesian optimization in two experiments on latent Dirichlet allocation and deep latent Gaussian models.
Meta-Learning Acquisition Functions for Bayesian Optimization
This work proposes a method to meta-learn customized optimizers within the well-established framework of Bayesian optimization (BO), allowing the algorithm to utilize the proven generalization capabilities of Gaussian processes.
Automatic Hyperparameter Tuning of Machine Learning Models under Time Constraints
This paper proposes to take the execution time of each trial into account to find an optimal or suboptimal hyperparameter configuration faster than other Bayesian optimization-based approaches in terms of execution time.
Fast Bayesian Optimization of Machine Learning Hyperparameters on Large Datasets
A generative model for the validation error as a function of training set size is proposed, which learns during the optimization process and allows exploration of preliminary configurations on small subsets, by extrapolating to the full dataset.
Hyperparameter Optimization: Foundations, Algorithms, Best Practices and Open Challenges
This work gives practical recommendations regarding important choices to be made when conducting HPO, including the HPO algorithms themselves, performance evaluation, how to combine HPO with machine learning pipelines, runtime improvements, and parallelization.
Towards Automatic Bayesian Optimization: A first step involving acquisition functions
This paper proposes a first attempt over automatic Bayesian optimization by exploring several heuristics that automatically tune the acquisition function of bayesian optimization and illustrates the effectiveness of these heurisitcs in a set of benchmark problems and a hyperparameter tuning problem of a machine learning algorithm.
Practical Bayesian optimization with application to tuning machine learning algorithms
This thesis motivates and introduces the model-based optimization framework and provides some historical context to the technique that dates back as far as 1933 with application to clinical drug trials, with important directions for future research.
Adaptive Local Bayesian Optimization Over Multiple Discrete Variables
Empirical evaluations demonstrate that the approach of team KAIST OSI in a step-wise manner outperforms the existing methods across different tasks, and exceeds the baseline algorithms by up to +20.39%.
Weighting Is Worth the Wait: Bayesian Optimization with Importance Sampling
Casting hyperparameter search as a multi-task Bayesian optimization problem over both hyperparameters and importance sampling design achieves the best of both worlds: by learning a parameterization of IS that trades-off evaluation complexity and quality, it improves upon Bayesian optimize state-of-the-art runtime and final validation error across a variety of datasets and complex neural architectures.
Faster & More Reliable Tuning of Neural Networks: Bayesian Optimization with Importance Sampling
This work proposes to accelerate tuning of neural networks in a robust way by taking into account the relative amount of information contributed by each training example, and leverages importance sampling (IS) to do so, which results in more reliable performance of the method in less wall-clock time.


Algorithms for Hyper-Parameter Optimization
This work contributes novel techniques for making response surface models P(y|x) in which many elements of hyper-parameter assignment (x) are known to be irrelevant given particular values of other elements.
Convergence Rates of Efficient Global Optimization Algorithms
  • A. Bull
  • Computer Science, Mathematics
    J. Mach. Learn. Res.
  • 2011
This work provides convergence rates for expected improvement, and proposes alternative estimators, chosen to minimize the constants in the rate of convergence, and shows these estimators retain the convergence rates of a fixed prior.
Gaussian Processes for Machine Learning
The treatment is comprehensive and self-contained, targeted at researchers and students in machine learning and applied statistics, and deals with the supervised learning problem for both regression and classification.
Self-Paced Learning for Latent Variable Models
A novel, iterative self-paced learning algorithm where each iteration simultaneously selects easy samples and learns a new parameter vector that outperforms the state of the art method for learning a latent structural SVM on four applications.
Gaussian Process Optimization in the Bandit Setting: No Regret and Experimental Design
This work analyzes GP-UCB, an intuitive upper-confidence based algorithm, and bound its cumulative regret in terms of maximal information gain, establishing a novel connection between GP optimization and experimental design and obtaining explicit sublinear regret bounds for many commonly used covariance functions.
Sequential Model-Based Optimization for General Algorithm Configuration
This paper extends the explicit regression models paradigm for the first time to general algorithm configuration problems, allowing many categorical parameters and optimization for sets of instances, and yields state-of-the-art performance.
A Tutorial on Bayesian Optimization of Expensive Cost Functions, with Application to Active User Modeling and Hierarchical Reinforcement Learning
A tutorial on Bayesian optimization, a method of finding the maximum of expensive cost functions using the Bayesian technique of setting a prior over the objective function and combining it with evidence to get a posterior function.
Selecting Receptive Fields in Deep Networks
This paper proposes a fast method to choose local receptive fields that group together those low-level features that are most similar to each other according to a pairwise similarity metric, and produces results showing how this method allows even simple unsupervised training algorithms to train successful multi-layered networks that achieve state-of-the-art results on CIFAR and STL datasets.
Bayesian calibration of computer models
A Bayesian calibration technique which improves on this traditional approach in two respects and attempts to correct for any inadequacy of the model which is revealed by a discrepancy between the observed data and the model predictions from even the best‐fitting parameter values is presented.
Learning Multiple Layers of Features from Tiny Images
It is shown how to train a multi-layer generative model that learns to extract meaningful features which resemble those found in the human visual cortex, using a novel parallelization algorithm to distribute the work among multiple machines connected on a network.