• Corpus ID: 233715000

# Implicit differentiation for fast hyperparameter selection in non-smooth convex learning

@article{Bertrand2021ImplicitDF,
title={Implicit differentiation for fast hyperparameter selection in non-smooth convex learning},
author={Quentin Bertrand and Quentin Klopfenstein and Mathurin Massias and Mathieu Blondel and Samuel Vaiter and Alexandre Gramfort and Joseph Salmon},
journal={ArXiv},
year={2021},
volume={abs/2105.01637}
}
• Published 4 May 2021
• Computer Science, Mathematics
• ArXiv
Finding the optimal hyperparameters of a model can be cast as a bilevel optimization problem, typically solved using zero-order techniques. In this work we study ﬁrst-order methods when the inner optimization problem is convex but non-smooth. We show that the forward-mode diﬀerentiation of proximal gradient descent and proximal coordinate descent yield sequences of Jacobians converging toward the exact Jacobian. Using implicit diﬀerentiation, we show it is possible to leverage the non…
7 Citations

## Figures and Tables from this paper

Value Function Based Difference-of-Convex Algorithm for Bilevel Hyperparameter Selection Problems
• Computer Science
ICML
• 2022
This work develops a sequentially convergent Value Function based Difference-of-Convex Algorithm with inexactness (VF-iDCA) and shows that this algorithm achieves stationary solutions without LLSC and LLS assumptions for bilevel programs from a broad class of hyperparameter tuning applications.
Nonsmooth Implicit Differentiation for Machine Learning and Optimization
• Computer Science, Mathematics
NeurIPS
• 2021
A nonsmooth implicit function theorem with an operational calculus is established and several applications, such as training deep equilibrium networks, training neural nets with conic optimization layers, or hyperparameter-tuning for nonsm Smooth Lasso-type models are provided.
Bilevel Optimization with a Lower-level Contraction: Optimal Sample Complexity without Warm-Start
• Computer Science
ArXiv
• 2022
A simple method which uses stochastic point iterations at the lower-level and projected inexact gradient descent at the upper-level, that reaches an ǫ -stationary point using O (ǫ − 2 ) and ˜ O ( ǵ − 1 ) samples for the stochastics and the deterministic setting, respectively is proposed.
Efficient and Modular Implicit Differentiation
• Computer Science
ArXiv
• 2021
This paper proposes automatic implicit diﬀerentiation, an e-cient and modular approach for implicit di-erentiation of optimization problems, and shows the ease of formulating and solving bi-level optimization problems using the framework.
Electromagnetic neural source imaging under sparsity constraints with SURE-based hyperparameter tuning
• Computer Science
• 2021
This paper proposes to use a proxy of the Stein’s Unbiased Risk Estimator (SURE) to automatically select their regularization parameters and shows that the proposed SURE approach outperforms cross-validation strategies and state-of-the-art Bayesian statistics methods both computationally and statistically.
PUDLE: Implicit Acceleration of Dictionary Learning by Backpropagation
• Computer Science
ArXiv
• 2021
The theoretical proof for these empirical results through PUDLE, a Provable Unfolded Dictionary LEarning method is offered, providing sufficient conditions on the network initialization and data distribution for model recovery and highlighting the interpretability of PULDE by deriving a mathematical relation between network weights, its output, and the training data.
Stable and Interpretable Unrolled Dictionary Learning
• Computer Science
• 2021
PUDLE’s interpretability is demonstrated, a driving factor in designing deep networks based on iterative optimizations, by building a mathematical relation between network weights, its output, and the training set.

## References

SHOWING 1-10 OF 127 REFERENCES
• Computer Science, Mathematics
AISTATS
• 2021
This work provides iteration complexity bounds for the mean square error of the hypergradient approximation, under the assumption that the lower-level problem is accessible only through a stochastic mapping which is a contraction in expectation.
This work proposes an algorithm for the optimization of continuous hyperparameters using inexact gradient information and gives sufficient conditions for the global convergence of this method, based on regularity conditions of the involved functions and summability of errors.
Optimizing Millions of Hyperparameters by Implicit Differentiation
• Computer Science
AISTATS
• 2020
An algorithm for inexpensive gradient-based hyperparameter optimization that combines the implicit function theorem (IFT) with efficient inverse Hessian approximations is proposed and used to train modern network architectures with millions of weights and millions of hyper-parameters.
On the Iteration Complexity of Hypergradient Computation
• Computer Science, Mathematics
ICML
• 2020
A unified analysis is presented which allows for the first time to quantitatively compare these methods, providing explicit bounds for their iteration complexity, and suggests a hierarchy in terms of computational efficiency among the above methods.
Bilevel Programming for Hyperparameter Optimization and Meta-Learning
• Computer Science
ICML
• 2018
We introduce a framework based on bilevel programming that unifies gradient-based hyperparameter optimization and meta-learning. We show that an approximate version of the bilevel problem can be
Differentiating the Value Function by using Convex Duality
• Computer Science
AISTATS
• 2021
This work uses a well known result from convex duality theory to relax the conditions and to derive convergence rates of the derivative approximation for several classes of parametric optimization problems in Machine Learning.
A Bilevel Optimization Approach for Parameter Learning in Variational Models
• Computer Science, Mathematics
SIAM J. Imaging Sci.
• 2013
This work considers a class of image denoising models incorporating $\ell_p$-norm--based analysis priors using a fixed set of linear operators and devise semismooth Newton methods for solving the resulting nonsmooth bilevel optimization problems.