Corpus ID: 552516

Hyperparameter optimization with approximate gradient

@article{Pedregosa2016HyperparameterOW,
  title={Hyperparameter optimization with approximate gradient},
  author={Fabian Pedregosa},
  journal={ArXiv},
  year={2016},
  volume={abs/1602.02355}
}
Most models in machine learning contain at least one hyperparameter to control for model complexity. Choosing an appropriate set of hyperparameters is both crucial in terms of model accuracy and computationally challenging. In this work we propose an algorithm for the optimization of continuous hyperparameters using inexact gradient information. An advantage of this method is that hyperparameters can be updated before model parameters have fully converged. We also give sufficient conditions for… Expand
141 Citations
A Gradient-based Bilevel Optimization Approach for Tuning Hyperparameters in Machine Learning
  • 2
  • PDF
Implicit differentiation for fast hyperparameter selection in non-smooth convex learning
  • Highly Influenced
  • PDF
Online Hyperparameter Search Interleaved with Proximal Parameter Updates
  • PDF
Optimizing Large-Scale Hyperparameters via Automated Learning Algorithm
  • Highly Influenced
  • PDF
ING STRUCTURED BEST-RESPONSE FUNCTIONS
  • 1
Optimizing Millions of Hyperparameters by Implicit Differentiation
  • 41
  • Highly Influenced
  • PDF
Bilevel Programming for Hyperparameter Optimization and Meta-Learning
  • 176
  • Highly Influenced
  • PDF
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 42 REFERENCES
Gradient-Based Optimization of Hyperparameters
  • 309
  • Highly Influential
  • PDF
Efficient multiple hyperparameter learning for log-linear models
  • 88
  • Highly Influential
  • PDF
Sequential Model-Based Ensemble Optimization
  • 21
  • PDF
Gradient-based Hyperparameter Optimization through Reversible Learning
  • 450
  • Highly Influential
  • PDF
Adaptive Regularization in Neural Network Modeling
  • 38
  • PDF
A Bilevel Optimization Approach for Parameter Learning in Variational Models
  • 130
  • PDF
Stein Unbiased GrAdient estimator of the Risk (SUGAR) for Multiple Parameter Selection
  • 85
  • PDF
Freeze-Thaw Bayesian Optimization
  • 153
  • PDF
Hybrid Deterministic-Stochastic Methods for Data Fitting
  • 254
  • PDF
Cross-Validation Optimization for Large Scale Structured Classification Kernel Methods
  • M. Seeger
  • Mathematics, Computer Science
  • J. Mach. Learn. Res.
  • 2008
  • 20
  • PDF
...
1
2
3
4
5
...