• Corpus ID: 1791473

# OptNet: Differentiable Optimization as a Layer in Neural Networks

@inproceedings{Amos2017OptNetDO,
title={OptNet: Differentiable Optimization as a Layer in Neural Networks},
author={Brandon Amos and J. Zico Kolter},
booktitle={ICML},
year={2017}
}
• Published in ICML 1 March 2017
• Computer Science
This paper presents OptNet, a network architecture that integrates optimization problems (here, specifically in the form of quadratic programs) as individual layers in larger end-to-end trainable deep networks. These layers encode constraints and complex dependencies between the hidden states that traditional convolutional and fully-connected layers often cannot capture. In this paper, we explore the foundations for such an architecture: we show how techniques from sensitivity analysis, bilevel…
510 Citations

## Figures and Tables from this paper

• Computer Science
ArXiv
• 2021
This paper presents an alternative network layer architecture based on the alternating direction method of multipliers (ADMM) that is capable of scaling to problems with a moderately large number of variables that is efficient, from both a memory and computation standpoint, in comparison to the standard approach.

### Differentiable Convex Optimization Layers

• Computer Science
NeurIPS
• 2019
This paper introduces disciplined parametrized programming, a subset of disciplined convex programming, and demonstrates how to efficiently differentiate through each of these components, allowing for end-to-end analytical differentiation through the entire convex program.

### Differentiable Forward and Backward Fixed-Point Iteration Layers

• Computer Science
IEEE Access
• 2021
Experiments show that the fixed-point iteration (FPI) layer can be successfully applied to real-world problems such as image denoising, optical flow, and multi-label classification.

### DIFFERENTIABLE OPTIMIZATION OF GENERALIZED NONDECOMPOSABLE FUNCTIONS

• Computer Science
• 2020
It is shown how adopting a set of influential ideas proposed by Mangasarian for 1-norm SVMs – which advocates for solving LPs with a generalized Newton method – provides a simple and effective solution and needs little unrolling, which makes it more efficient during backward pass.

### Differentiable Optimization of Generalized Nondecomposable Functions using Linear Programs

• Computer Science
NeurIPS
• 2021
It is shown how adopting a set of ingenious ideas proposed by Mangasarian for 1-norm SVMs - which advocates for solving LPs with a generalized Newton method - provides a simple and effective solution that can be run on the GPU that turns out to be applicable without any specific adjustments or relaxations.

### Convex optimization with an interpolation-based projection and its application to deep learning

• Computer Science
Mach. Learn.
• 2021
This paper proposes an interpolation-based projection that is computationally cheap and easy to compute given a convex, domain defining, function and proposes an optimization algorithm that follows the gradient of the composition of the objective and the projection and proves its convergence for linear objectives and arbitrary convex and Lipschitz domain defining inequality constraints.

### Differentiable Fixed-Point Iteration Layer

• Computer Science
ArXiv
• 2020
It is shown that the derivative of an FPI layer depends only on the fixed point, and then a method to calculate it efficiently using another FPI which is called the backward FPI is presented.

### CNNS THROUGH DIFFERENTIABLE PDE LAYER

• Computer Science
• 2020
A novel differentiable spectral projection layer for neural networks that efficiently enforces spatial PDE constraints using spectral methods, yet is fully differentiable, allowing for its use as a layer within Convolutional Neural Networks (CNNs) during end-to-end training, and it is shown that its computational cost is cheaper than a single convolution layer.

### Exploiting Problem Structure in Deep Declarative Networks: Two Case Studies

• Computer Science
ArXiv
• 2022
This work studies two applications of deep declarative networks—robust vector pooling and optimal transport—and shows how problem structure can be exploited to obtain very efficient backward pass computations in terms of both time and memory.

### Gradient Backpropagation Through Combinatorial Algorithms: Identity with Projection Works

• Computer Science
ArXiv
• 2022
A principled approach to exploit the geometry of the discrete solution space to treat the solver as a negative identity on the backward pass and further provide a theoretical justiﬁcation is proposed.

## References

SHOWING 1-10 OF 40 REFERENCES

### Input Convex Neural Networks

• Computer Science
ICML
• 2017
This paper presents the input convex neural network architecture. These are scalar-valued (potentially deep) neural networks with constraints on the network parameters such that the output of the

### Adam: A Method for Stochastic Optimization

• Computer Science
ICLR
• 2015
This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.

### Conditional Random Fields as Recurrent Neural Networks

• Computer Science
2015 IEEE International Conference on Computer Vision (ICCV)
• 2015
A new form of convolutional neural network that combines the strengths of Convolutional Neural Networks (CNNs) and Conditional Random Fields (CRFs)-based probabilistic graphical modelling is introduced, and top results are obtained on the challenging Pascal VOC 2012 segmentation benchmark.

### On Differentiating Parameterized Argmin and Argmax Problems with Application to Bi-level Optimization

• Computer Science
ArXiv
• 2016
Some results on differentiating argmin and argmax optimization problems with and without constraints are collected and some insightful motivating examples are provided.

### On solving constrained optimization problems with neural networks: a penalty method approach

• Computer Science
IEEE Trans. Neural Networks
• 1993
The canonical nonlinear programming circuit is shown to be a gradient system that seeks to minimize an unconstrained energy function that can be viewed as a penalty method approximation of the original problem.

• Computer Science
NIPS
• 2014
We propose a new framework for estimating generative models via an adversarial process, in which we simultaneously train two models: a generative model G that captures the data distribution, and a

### Generic Methods for Optimization-Based Modeling

Experimental results on denoising and image labeling problems show that learning with truncated optimization greatly reduces computational expense compared to “full” fitting.

### End-to-End Learning for Structured Prediction Energy Networks

• Computer Science
ICML
• 2017
End-to-end learning for SPENs is presented, where the energy function is discriminatively trained by back-propagating through gradient-based prediction, and the approach is substantially more accurate than the structured SVM method of Belanger and McCallum (2016).

### A Bilevel Optimization Approach for Parameter Learning in Variational Models

• Computer Science, Mathematics
SIAM J. Imaging Sci.
• 2013
This work considers a class of image denoising models incorporating $\ell_p$-norm--based analysis priors using a fixed set of linear operators and devise semismooth Newton methods for solving the resulting nonsmooth bilevel optimization problems.

### Large Margin Methods for Structured and Interdependent Output Variables

• Computer Science
J. Mach. Learn. Res.
• 2005
This paper proposes to appropriately generalize the well-known notion of a separation margin and derive a corresponding maximum-margin formulation and presents a cutting plane algorithm that solves the optimization problem in polynomial time for a large class of problems.