• Corpus ID: 319311

Efficient Transfer Learning Method for Automatic Hyperparameter Tuning

  title={Efficient Transfer Learning Method for Automatic Hyperparameter Tuning},
  author={Dani Yogatama and Gideon S. Mann},
  booktitle={International Conference on Artificial Intelligence and Statistics},
We propose a fast and effective algorithm for automatic hyperparameter tuning that can generalize across datasets. [] Key Method The time complexity of reconstructing the response surface at every SMBO iteration in our method is linear in the number of trials (significantly less than previous work with comparable performance), allowing the method to realistically scale to many more datasets. Specifically, we use deviations from the per-dataset mean as the response values. We empirically show the superiority…

Figures and Tables from this paper

Hyperparameter Transfer Learning through Surrogate Alignment for Efficient Deep Neural Network Training

The proposed method uses surrogates to model the hyperparameter-error distributions of the two datasets and trains a neural network to learn the transfer function, which demonstrates the efficiency of the method.

Transferable Neural Processes for Hyperparameter Optimization

An end-to-end and efficient HPO algorithm named as Transfer Neural Processes (TNP), which achieves transfer learning by incorporating trials on other datasets, initializing the model with well-generalized parameters, and learning an initial set of hyperparameters to evaluate.

Two-Stage Transfer Surrogate Model for Automatic Hyperparameter Optimization

This work presents a model that transfers knowledge of the performance of an algorithm on given other data sets to automatically accelerate the hyperparameter optimization for a new data set and is outperforming the state of the art methods.

Meta-learning Hyperparameter Performance Prediction with Neural Processes

This work proposes an end-to-end surrogate named as Transfer Neural Processes (TNP) that learns a comprehensive set of meta-knowledge, including the parameters of historical surrogates, historical trials, and initial configurations for other datasets.

Beyond Manual Tuning of Hyperparameters

This work discusses two strategies towards making machine learning algorithms more autonomous: automated optimization of hyperparameters (including mechanisms for feature selection, preprocessing, model selection, etc) and the development of algorithms with reduced sets ofhyperparameters.

A simple transfer-learning extension of Hyperband

This paper proposes a model-based extension of Hyperband, replacing the uniform random sampling of HP candidates by an adaptive non-uniform sampling procedure, and applies the method to the problem of tuning the learning rate when solving linear regression problems and to the optimization of the HPs of XGBoost binary classifiers across different datasets.

Sequential Model-Free Hyperparameter Tuning

This work adapts the sequential model-based optimization by replacing its surrogate model and acquisition function with one policy that is optimized for the task of hyperparameter tuning and proposes a similarity measure for data sets that yields more comprehensible results than those using meta-features.

Using Meta-Learning to Initialize Bayesian Optimization of Hyperparameters

The possibility of speeding up SMBO by transferring knowledge from previous optimization runs on similar datasets is explored and a proposal to initialize SMBO with a small number of configurations suggested by a metalearning procedure mildly improves the state of the art in low-dimensional hyperparameter optimization.

Transfer Learning based Search Space Design for Hyperparameter Tuning

This work introduces an automatic method to design the BO search space with the aid of tuning history from past tasks, which considerably boosts BO by designing a promising and compact search space instead of using the entire space, and outperforms the state-of-the-arts on a wide range of benchmarks.

Scalable Gaussian process-based transfer surrogates for hyperparameter optimization

This work proposes to learn individual surrogate models on the observations of each data set and then combine all surrogates to a joint one using ensembling techniques, and extends the framework to directly estimate the acquisition function in the same setting, using a novel technique which is name the “transfer acquisition function”.



Collaborative hyperparameter tuning

A generic method to incorporate knowledge from previous experiments when simultaneously tuning a learning algorithm on new problems at hand is proposed and is demonstrated in two experiments where it outperforms standard tuning techniques and single-problem surrogate-based optimization.

Multi-Task Bayesian Optimization

This paper proposes an adaptation of a recently developed acquisition function, entropy search, to the cost-sensitive, multi-task setting and demonstrates the utility of this new acquisition function by leveraging a small dataset to explore hyper-parameter settings for a large dataset.

Auto-WEKA: combined selection and hyperparameter optimization of classification algorithms

This work considers the problem of simultaneously selecting a learning algorithm and setting its hyperparameters, going beyond previous work that attacks these issues separately and shows classification performance often much better than using standard selection and hyperparameter optimization methods.

Algorithms for Hyper-Parameter Optimization

This work contributes novel techniques for making response surface models P(y|x) in which many elements of hyper-parameter assignment (x) are known to be irrelevant given particular values of other elements.

Practical Bayesian Optimization of Machine Learning Algorithms

This work describes new algorithms that take into account the variable cost of learning algorithm experiments and that can leverage the presence of multiple cores for parallel experimentation and shows that these proposed algorithms improve on previous automatic procedures and can reach or surpass human expert-level optimization for many algorithms.

Regularization and variable selection via the elastic net

It is shown that the elastic net often outperforms the lasso, while enjoying a similar sparsity of representation, and an algorithm called LARS‐EN is proposed for computing elastic net regularization paths efficiently, much like algorithm LARS does for the lamba.

Cutting-plane training of structural SVMs

This paper explores how cutting-plane methods can provide fast training not only for classification SVMs, but also for structural SVMs and presents an extensive empirical evaluation of the method applied to binary classification, multi-class classification, HMM sequence tagging, and CFG parsing.

Gaussian Process Optimization in the Bandit Setting: No Regret and Experimental Design

This work analyzes GP-UCB, an intuitive upper-confidence based algorithm, and bound its cumulative regret in terms of maximal information gain, establishing a novel connection between GP optimization and experimental design and obtaining explicit sublinear regret bounds for many commonly used covariance functions.

Portfolio Allocation for Bayesian Optimization

A portfolio of acquisition functions governed by an online multi-armed bandit strategy is proposed, the best of which is called GP-Hedge, and it is shown that this method outperforms the best individual acquisition function.

Gaussian Processes for Machine Learning

The treatment is comprehensive and self-contained, targeted at researchers and students in machine learning and applied statistics, and deals with the supervised learning problem for both regression and classification.