Corpus ID: 319311

Efficient Transfer Learning Method for Automatic Hyperparameter Tuning

@inproceedings{Yogatama2014EfficientTL,
  title={Efficient Transfer Learning Method for Automatic Hyperparameter Tuning},
  author={Dani Yogatama and Gideon S. Mann},
  booktitle={AISTATS},
  year={2014}
}
We propose a fast and effective algorithm for automatic hyperparameter tuning that can generalize across datasets. [...] Key Method The time complexity of reconstructing the response surface at every SMBO iteration in our method is linear in the number of trials (significantly less than previous work with comparable performance), allowing the method to realistically scale to many more datasets. Specifically, we use deviations from the per-dataset mean as the response values. We empirically show the superiority…Expand
Hyperparameter Transfer Learning through Surrogate Alignment for Efficient Deep Neural Network Training
TLDR
The proposed method uses surrogates to model the hyperparameter-error distributions of the two datasets and trains a neural network to learn the transfer function, which demonstrates the efficiency of the method. Expand
Transferable Neural Processes for Hyperparameter Optimization
TLDR
An end-to-end and efficient HPO algorithm named as Transfer Neural Processes (TNP), which achieves transfer learning by incorporating trials on other datasets, initializing the model with well-generalized parameters, and learning an initial set of hyperparameters to evaluate. Expand
Two-Stage Transfer Surrogate Model for Automatic Hyperparameter Optimization
TLDR
This work presents a model that transfers knowledge of the performance of an algorithm on given other data sets to automatically accelerate the hyperparameter optimization for a new data set and is outperforming the state of the art methods. Expand
Meta-learning Hyperparameter Performance Prediction with Neural Processes
TLDR
This work proposes an end-to-end surrogate named as Transfer Neural Processes (TNP) that learns a comprehensive set of meta-knowledge, including the parameters of historical surrogates, historical trials, and initial configurations for other datasets. Expand
Learning to Transfer Initializations for Bayesian Hyperparameter Optimization
TLDR
A Siamese network is constructed with convolutional layers followed by bi-directional LSTM layers, to learn meta-features, which are used to select a few datasets that are similar to the new dataset, so that a set of configurations in similar datasets is adopted as initializations for Bayesian hyperparameter optimization. Expand
Beyond Manual Tuning of Hyperparameters
TLDR
This work discusses two strategies towards making machine learning algorithms more autonomous: automated optimization of hyperparameters (including mechanisms for feature selection, preprocessing, model selection, etc) and the development of algorithms with reduced sets ofhyperparameters. Expand
A simple transfer-learning extension of Hyperband
TLDR
This paper proposes a model-based extension of Hyperband, replacing the uniform random sampling of HP candidates by an adaptive non-uniform sampling procedure, and applies the method to the problem of tuning the learning rate when solving linear regression problems and to the optimization of the HPs of XGBoost binary classifiers across different datasets. Expand
Sequential Model-Free Hyperparameter Tuning
TLDR
This work adapts the sequential model-based optimization by replacing its surrogate model and acquisition function with one policy that is optimized for the task of hyperparameter tuning and proposes a similarity measure for data sets that yields more comprehensible results than those using meta-features. Expand
Using Meta-Learning to Initialize Bayesian Optimization of Hyperparameters
TLDR
The possibility of speeding up SMBO by transferring knowledge from previous optimization runs on similar datasets is explored and a proposal to initialize SMBO with a small number of configurations suggested by a metalearning procedure mildly improves the state of the art in low-dimensional hyperparameter optimization. Expand
Scalable Gaussian process-based transfer surrogates for hyperparameter optimization
TLDR
This work proposes to learn individual surrogate models on the observations of each data set and then combine all surrogates to a joint one using ensembling techniques, and extends the framework to directly estimate the acquisition function in the same setting, using a novel technique which is name the “transfer acquisition function”. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 21 REFERENCES
Collaborative hyperparameter tuning
TLDR
A generic method to incorporate knowledge from previous experiments when simultaneously tuning a learning algorithm on new problems at hand is proposed and is demonstrated in two experiments where it outperforms standard tuning techniques and single-problem surrogate-based optimization. Expand
Multi-Task Bayesian Optimization
TLDR
This paper proposes an adaptation of a recently developed acquisition function, entropy search, to the cost-sensitive, multi-task setting and demonstrates the utility of this new acquisition function by leveraging a small dataset to explore hyper-parameter settings for a large dataset. Expand
Auto-WEKA: combined selection and hyperparameter optimization of classification algorithms
TLDR
This work considers the problem of simultaneously selecting a learning algorithm and setting its hyperparameters, going beyond previous work that attacks these issues separately and shows classification performance often much better than using standard selection and hyperparameter optimization methods. Expand
Algorithms for Hyper-Parameter Optimization
TLDR
This work contributes novel techniques for making response surface models P(y|x) in which many elements of hyper-parameter assignment (x) are known to be irrelevant given particular values of other elements. Expand
Practical Bayesian Optimization of Machine Learning Algorithms
TLDR
This work describes new algorithms that take into account the variable cost of learning algorithm experiments and that can leverage the presence of multiple cores for parallel experimentation and shows that these proposed algorithms improve on previous automatic procedures and can reach or surpass human expert-level optimization for many algorithms. Expand
Regularization and variable selection via the elastic net
Summary. We propose the elastic net, a new regularization and variable selection method. Real world data and a simulation study show that the elastic net often outperforms the lasso, while enjoying aExpand
Cutting-plane training of structural SVMs
TLDR
This paper explores how cutting-plane methods can provide fast training not only for classification SVMs, but also for structural SVMs and presents an extensive empirical evaluation of the method applied to binary classification, multi-class classification, HMM sequence tagging, and CFG parsing. Expand
Gaussian Process Optimization in the Bandit Setting: No Regret and Experimental Design
TLDR
This work analyzes GP-UCB, an intuitive upper-confidence based algorithm, and bound its cumulative regret in terms of maximal information gain, establishing a novel connection between GP optimization and experimental design and obtaining explicit sublinear regret bounds for many commonly used covariance functions. Expand
Portfolio Allocation for Bayesian Optimization
TLDR
A portfolio of acquisition functions governed by an online multi-armed bandit strategy is proposed, the best of which is called GP-Hedge, and it is shown that this method outperforms the best individual acquisition function. Expand
Gaussian Processes for Machine Learning
TLDR
The treatment is comprehensive and self-contained, targeted at researchers and students in machine learning and applied statistics, and deals with the supervised learning problem for both regression and classification. Expand
...
1
2
3
...