Least squares auto-tuning

  title={Least squares auto-tuning},
  author={Shane T. Barratt and Stephen P. Boyd},
  journal={Engineering Optimization},
  pages={789 - 810}
Least squares auto-tuning automatically finds hyper-parameters in least squares problems that minimize another (true) objective. The least squares tuning optimization problem is non-convex, so it cannot be solved efficiently. This article presents a powerful proximal gradient method for least squares auto-tuning that can be used to find good, if not the best, hyper-parameters for least squares problems. The application of least squares auto-tuning to data fitting is discussed. Numerical… Expand
A block coordinate descent optimizer for classification problems exploiting convexity
A coordinate descent method to train deep neural networks for classification tasks that exploits global convexity of the cross-entropy loss in the weights of the linear layer and the resulting Hessian matrix is symmetric semi-definite, and that the Newton step realizes a global minimizer. Expand
Learning Convex Optimization Models
A heuristic for learning the parameters in a convex optimization model given a dataset of input-output pairs is proposed, using recently developed methods for differentiating the solution of a conveX optimization problem with respect to its parameters. Expand
Differentiable Convex Optimization Layers
This paper introduces disciplined parametrized programming, a subset of disciplined convex programming, and demonstrates how to efficiently differentiate through each of these components, allowing for end-to-end analytical differentiation through the entire convex program. Expand
Finding the Closest Solvable Convex Optimization Problem
Given an infeasible, unbounded, or pathological convex optimization problem, a natural question to ask is: what is the smallest change we can make to the problem’s parameters such that the problemExpand
Automatic repair of convex optimization problems
This paper proposes a heuristic for approximately solving this problem that is based on the penalty method and leverages recently developed methods that can efficiently evaluate the derivative of the solution of a convex cone program with respect to its parameters. Expand
Fitting a Kalman Smoother to Data
A Kalman smoother auto-tuning algorithm is derived, which is based on the proximal gradient method, that finds good, if not the best, parameters for a given dataset. Expand
Improving the Targets’ Trajectories Estimated by an Automotive RADAR Sensor Using Polynomial Fitting
A new polynomial fitting method based on wavelets in two steps: denoising andPolynomial part extraction, which compares favorably with the classical polynometric fitting method is proposed. Expand
Differentiating through Log-Log Convex Programs
This work shows how to efficiently compute the derivative (when it exists) of the solution map of log-log convex programs (LLCPs) and uses the adjoint of the derivative to implement differentiable log- log convex optimization layers in PyTorch and TensorFlow. Expand


A Regularized Variable Projection Algorithm for Separable Nonlinear Least-Squares Problems
This paper proposes to determine the regularization parameter using the weighted generalized cross-validation method at every iteration of ill-conditioned SNLLS problems based on the variable projection method to produce a consistent demand of decreasing at successive iterations. Expand
Hyperparameter optimization with approximate gradient
This work proposes an algorithm for the optimization of continuous hyperparameters using inexact gradient information and gives sufficient conditions for the global convergence of this method, based on regularity conditions of the involved functions and summability of errors. Expand
Solving least squares problems
Since the lm function provides a lot of features it is rather complicated. So we are going to instead use the function lsfit as a model. It computes only the coefficient estimates and the residuals.Expand
Scalable Gradient-Based Tuning of Continuous Regularization Hyperparameters
The approach for tuning regularization hyperparameters is explored and it is found that in experiments on MNIST, SVHN and CIFAR-10, the resulting regularization levels are within the optimal regions. Expand
An Efficient Method for Gradient-Based Adaptation of Hyperparameters in SVM Models
It is shown that for large-scale problems involving a wide choice of kernel-based models and validation functions, this computation can be very efficiently done; often within just a fraction of the training time. Expand
Gradient-Based Optimization of Hyperparameters
This article presents a methodology to optimize several hyper-parameters, based on the computation of the gradient of a model selection criterion with respect to the hyperparameter gradient involving second derivatives of the training criterion. Expand
Learning to Reweight Examples for Robust Deep Learning
This work proposes a novel meta-learning algorithm that learns to assign weights to training examples based on their gradient directions that can be easily implemented on any type of deep network, does not require any additional hyperparameter tuning, and achieves impressive performance on class imbalance and corrupted label problems where only a small amount of clean validation data is available. Expand
Adaptive Regularization in Neural Network Modeling
The idea is to minimize an empirical estimate - like the cross-validation estimate - of the generalization error with respect to regularization parameters by employing a simple iterative gradient descent scheme using virtually no additional programming overhead compared to standard training. Expand
Adaptive Regularization in Neural Network Modeling
The idea is to minimize an empirical estimate of the generalization error with respect to regularization parameters by employing a simple iterative gradient descent scheme using virtually no additional programming overhead compared to standard training. Expand
Gradient-based Hyperparameter Optimization through Reversible Learning
This work computes exact gradients of cross-validation performance with respect to all hyperparameters by chaining derivatives backwards through the entire training procedure, which allows us to optimize thousands ofhyperparameters, including step-size and momentum schedules, weight initialization distributions, richly parameterized regularization schemes, and neural network architectures. Expand