Random Search for Hyper-Parameter Optimization


Many machine learning algorithms have hyperparameters flags, values, and other configuration information that guides the algorithm. Sometimes this configuration applies to the space of functions that the learning algorithm searches (e.g. the number of nearest neighbours to use in KNN). Sometimes this configuration applies to the way in which the search is conducted (e.g. the step size in stochastic gradient descent). For better or for worse, it is common practice to judge a learning algorithm by its best-casescenario performance. Researchers are expected to maximize the performance of their algorithm by optimizing over hyper-parameter values by e.g. cross-validating using data withheld from the training set. Despite decades of research into global optimization (e.g. [8, 4, 9, 10]) and the publishing of several hyper-parameter optimization algorithms (e.g. [7, 1, 3]), it would seem that most machine learning researchers still prefer to carry out this optimization by hand, and by grid search (e.g. [6, 5, 2]). Here, we argue that in theory and experiment grid search (i.e. lattice-based brute force search) should almost never be used. Instead, quasirandom or even pseudo-random experiment designs (random experiments) should be preferred. Random experiments are just as easily parallelized as grid search, just as simple to design, and more reliable. Looking forward, we would like to investigate sequential hyper-parameter optimization algorithms and we hope that random search will serve as a credible baseline. Does random search work better? We did an experiment (Fig. 1) similar to [5] using random search instead of grid search. We op1 2 4 8 16 32 # trials 0.0 0.2 0.4 0.6 0.8 1.0

Extracted Key Phrases

2 Figures and Tables

Citations per Year

1,151 Citations

Semantic Scholar estimates that this publication has 1,151 citations based on the available data.

See our FAQ for additional information.

Cite this paper

@article{Bergstra2012RandomSF, title={Random Search for Hyper-Parameter Optimization}, author={James Bergstra and Yoshua Bengio}, journal={Journal of Machine Learning Research}, year={2012}, volume={13}, pages={281-305} }