• Corpus ID: 1791473

OptNet: Differentiable Optimization as a Layer in Neural Networks

  title={OptNet: Differentiable Optimization as a Layer in Neural Networks},
  author={Brandon Amos and J. Zico Kolter},
This paper presents OptNet, a network architecture that integrates optimization problems (here, specifically in the form of quadratic programs) as individual layers in larger end-to-end trainable deep networks. These layers encode constraints and complex dependencies between the hidden states that traditional convolutional and fully-connected layers often cannot capture. In this paper, we explore the foundations for such an architecture: we show how techniques from sensitivity analysis, bilevel… 

Figures and Tables from this paper

Efficient differentiable quadratic programming layers: an ADMM approach

This paper presents an alternative network layer architecture based on the alternating direction method of multipliers (ADMM) that is capable of scaling to problems with a moderately large number of variables that is efficient, from both a memory and computation standpoint, in comparison to the standard approach.

Differentiable Convex Optimization Layers

This paper introduces disciplined parametrized programming, a subset of disciplined convex programming, and demonstrates how to efficiently differentiate through each of these components, allowing for end-to-end analytical differentiation through the entire convex program.

Differentiable Forward and Backward Fixed-Point Iteration Layers

Experiments show that the fixed-point iteration (FPI) layer can be successfully applied to real-world problems such as image denoising, optical flow, and multi-label classification.


  • Computer Science
  • 2020
It is shown how adopting a set of influential ideas proposed by Mangasarian for 1-norm SVMs – which advocates for solving LPs with a generalized Newton method – provides a simple and effective solution and needs little unrolling, which makes it more efficient during backward pass.

Differentiable Optimization of Generalized Nondecomposable Functions using Linear Programs

It is shown how adopting a set of ingenious ideas proposed by Mangasarian for 1-norm SVMs - which advocates for solving LPs with a generalized Newton method - provides a simple and effective solution that can be run on the GPU that turns out to be applicable without any specific adjustments or relaxations.

Convex optimization with an interpolation-based projection and its application to deep learning

This paper proposes an interpolation-based projection that is computationally cheap and easy to compute given a convex, domain defining, function and proposes an optimization algorithm that follows the gradient of the composition of the objective and the projection and proves its convergence for linear objectives and arbitrary convex and Lipschitz domain defining inequality constraints.

Differentiable Fixed-Point Iteration Layer

It is shown that the derivative of an FPI layer depends only on the fixed point, and then a method to calculate it efficiently using another FPI which is called the backward FPI is presented.


A novel differentiable spectral projection layer for neural networks that efficiently enforces spatial PDE constraints using spectral methods, yet is fully differentiable, allowing for its use as a layer within Convolutional Neural Networks (CNNs) during end-to-end training, and it is shown that its computational cost is cheaper than a single convolution layer.

Exploiting Problem Structure in Deep Declarative Networks: Two Case Studies

This work studies two applications of deep declarative networks—robust vector pooling and optimal transport—and shows how problem structure can be exploited to obtain very efficient backward pass computations in terms of both time and memory.

Gradient Backpropagation Through Combinatorial Algorithms: Identity with Projection Works

A principled approach to exploit the geometry of the discrete solution space to treat the solver as a negative identity on the backward pass and further provide a theoretical justification is proposed.



Input Convex Neural Networks

This paper presents the input convex neural network architecture. These are scalar-valued (potentially deep) neural networks with constraints on the network parameters such that the output of the

Adam: A Method for Stochastic Optimization

This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.

Conditional Random Fields as Recurrent Neural Networks

A new form of convolutional neural network that combines the strengths of Convolutional Neural Networks (CNNs) and Conditional Random Fields (CRFs)-based probabilistic graphical modelling is introduced, and top results are obtained on the challenging Pascal VOC 2012 segmentation benchmark.

On Differentiating Parameterized Argmin and Argmax Problems with Application to Bi-level Optimization

Some results on differentiating argmin and argmax optimization problems with and without constraints are collected and some insightful motivating examples are provided.

On solving constrained optimization problems with neural networks: a penalty method approach

The canonical nonlinear programming circuit is shown to be a gradient system that seeks to minimize an unconstrained energy function that can be viewed as a penalty method approximation of the original problem.

Generative Adversarial Nets

We propose a new framework for estimating generative models via an adversarial process, in which we simultaneously train two models: a generative model G that captures the data distribution, and a

Generic Methods for Optimization-Based Modeling

Experimental results on denoising and image labeling problems show that learning with truncated optimization greatly reduces computational expense compared to “full” fitting.

End-to-End Learning for Structured Prediction Energy Networks

End-to-end learning for SPENs is presented, where the energy function is discriminatively trained by back-propagating through gradient-based prediction, and the approach is substantially more accurate than the structured SVM method of Belanger and McCallum (2016).

A Bilevel Optimization Approach for Parameter Learning in Variational Models

This work considers a class of image denoising models incorporating $\ell_p$-norm--based analysis priors using a fixed set of linear operators and devise semismooth Newton methods for solving the resulting nonsmooth bilevel optimization problems.

Large Margin Methods for Structured and Interdependent Output Variables

This paper proposes to appropriately generalize the well-known notion of a separation margin and derive a corresponding maximum-margin formulation and presents a cutting plane algorithm that solves the optimization problem in polynomial time for a large class of problems.