• Corpus ID: 235632013

Stabilizing Equilibrium Models by Jacobian Regularization

  title={Stabilizing Equilibrium Models by Jacobian Regularization},
  author={Shaojie Bai and Vladlen Koltun and J. Zico Kolter},
Deep equilibrium networks (DEQs) are a new class of models that eschews traditional depth in favor of finding the fixed point of a single nonlinear layer. These models have been shown to achieve performance competitive with the stateof-the-art deep networks while using significantly less memory. Yet they are also slower, brittle to architectural choices, and introduce potential instability to the model. In this paper, we propose a regularization scheme for DEQ models that explicitly regularizes… 

Stable Invariant Models via Koopman Spectra

The stable invariant model (SIM) is presented, a new class of deep models that in principle approximates DEQs under stability and extends the dynamics to more general ones converging to an invariant set (not restricted in a fixed point).

Optimization Induced Equilibrium Networks: An Explicit Optimization Perspective for Understanding Equilibrium Models.

This paper decomposes DNNs into a new class of unit layer that is the proximal operator of an implicit convex function while keeping its output unchanged, and derives the equilibrium model of the unit layer, which is derived and named OptEq, which outperforms previous implicit models even with fewer parameters.

Deep Equilibrium Optical Flow Estimation

This work proposes deep equilibrium (DEQ) flow estimators, an approach that directly solves for the flow as the infinite-level fixed point of an implicit layer (using any black-box solver) and differentiates through this fixed point analytically (thus requiring O (1) training memory).

Streaming Multiscale Deep Equilibrium Models

Through extensive experimental analysis, it is shown that StreamDEQ is able to recover near-optimal representations in a few frames time, and maintain an up-to-date representation throughout the video duration, and achieves on par accuracy with the baseline (standard MDEQ) while being more than 3 × faster.

Mixing Implicit and Explicit Deep Learning with Skip DEQs and Infinite Time Neural ODEs (Continuous DEQs)

Skip DEQ is developed, an implicit-explicit (IMEX) layer that simultaneously trains an explicit prediction followed by an implicit correction and shows how bridging the dichotomy of implicit and explicit deep learning can combine the advantages of both techniques.

Optimization inspired Multi-Branch Equilibrium Models

A new type of implicit model inspired by the designing of the systems’ hidden objective functions, called the Multi-branch Optimization induced Equilibrium networks (MOptEqs), which can better utilize the hierarchical patterns for recognition tasks and retain the abilities for interpreting the whole structure as trying to obtain the minima of the problem’s goal.

Object Representations as Fixed Points: Training Iterative Refinement Algorithms with Implicit Differentiation

This work observes that iterative refinement methods can be made differentiable by means of the implicit function theorem, and develops an implicit differentiation approach that improves the stability and tractability of training by decoupling the forward and backward passes.

CerDEQ: Certifiable Deep Equilibrium Model

This work aims to tackle the problem of DEQ’s certified training, and obtains the certifiable DEQ called CerDEQ, which can achieve state-of-the-art performance compared with models using regular convolution and linear layers on ℓ ∞ tasks.

On Training Implicit Models

This work proposes a novel gradient estimate for implicit models, named phantom gradient, that forgoes the costly computation of the exact gradient; and provides an update direction empirically preferable to the implicit model training.

LyaNet: A Lyapunov Framework for Training Neural ODEs

Theoretically, it is shown that minimizing Lyapunov loss guarantees exponential convergence to the correct solution and enables a novel robustness guarantee, and empirically, LyaNet can offer improved prediction performance, faster convergence of inference dynamics, and improved adversarial robustness.



Deep Equilibrium Models

It is shown that DEQs often improve performance over these state-of-the-art models (for similar parameter counts); have similar computational requirements to existing models; and vastly reduce memory consumption (often the bottleneck for training large sequence models), demonstrating an up-to 88% memory reduction in the authors' experiments.

Monotone operator equilibrium networks

A new class of implicit-depth model based on the theory of monotone operators, the Monotone Operator Equilibrium Network (MON), is developed, which vastly outperform the Neural ODE-based models while also being more computationally efficient.

Multiscale Deep Equilibrium Models

In both settings, MDEQs are able to match or exceed the performance of recent competitive computer vision models: the first time such performance and scale have been achieved by an implicit deep learning approach.

OptNet: Differentiable Optimization as a Layer in Neural Networks

OptNet is presented, a network architecture that integrates optimization problems (here, specifically in the form of quadratic programs) as individual layers in larger end-to-end trainable deep networks, and shows how techniques from sensitivity analysis, bilevel optimization, and implicit differentiation can be used to exactly differentiate through these layers.

Neural Ordinary Differential Equations

This work shows how to scalably backpropagate through any ODE solver, without access to its internal operations, which allows end-to-end training of ODEs within larger models.

FFJORD: Free-form Continuous Dynamics for Scalable Reversible Generative Models

This paper uses Hutchinson's trace estimator to give a scalable unbiased estimate of the log-density and demonstrates the approach on high-dimensional density estimation, image generation, and variational inference, achieving the state-of-the-art among exact likelihood methods with efficient sampling.

Robust Large Margin Deep Neural Networks

The analysis leads to the conclusion that a bounded spectral norm of the network's Jacobian matrix in the neighbourhood of the training samples is crucial for a deep neural network of arbitrary depth and width to generalize well.

Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks

A reparameterization of the weight vectors in a neural network that decouples the length of those weight vectors from their direction is presented, improving the conditioning of the optimization problem and speeding up convergence of stochastic gradient descent.

Hypersolvers: Toward Fast Continuous-Depth Models

The infinite-depth paradigm pioneered by Neural ODEs has launched a renaissance in the search for novel dynamical system-inspired deep learning primitives; however, their utilization in problems of

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

Applied to a state-of-the-art image classification model, Batch Normalization achieves the same accuracy with 14 times fewer training steps, and beats the original model by a significant margin.