Almost-Orthogonal Layers for Efficient General-Purpose Lipschitz Networks

  title={Almost-Orthogonal Layers for Efficient General-Purpose Lipschitz Networks},
  author={Bernd Prach and Christoph H. Lampert},
  booktitle={European Conference on Computer Vision},
It is a highly desirable property for deep networks to be robust against small input changes. One popular way to achieve this property is by designing networks with a small Lipschitz constant. In this work, we propose a new technique for constructing such Lipschitz networks that has a number of desirable properties: it can be applied to any linear network layer (fully-connected or convolutional), it provides formal guarantees on the Lipschitz constant, it is easy to implement and efficient to… 

Figures and Tables from this paper



Skew Orthogonal Convolutions

SOC allows us to train provably Lipschitz, large convolutional neural networks significantly faster than prior works while achieving significant improvements for both standard and certified robust accuracies.

Lipschitz regularity of deep neural networks: analysis and efficient estimation

This paper provides AutoLip, the first generic algorithm for upper bounding the Lipschitz constant of any automatically differentiable function, and proposes an improved algorithm named SeqLip that takes advantage of the linear computation graph to split the computation per pair of consecutive layers.

Improved deterministic l2 robustness on CIFAR-10 and CIFAR-100

A procedure to certify the robustness of 1-Lipschitz CNNs by relaxing the orthogonalization of the last linear layer of the network that significantly advances the state of the art for both standard and provable robust accuracies on CIFAR-100.

Orthogonalizing Convolutional Layers with the Cayley Transform

This work proposes and evaluates an alternative approach to directly parameterize convolutional layers that are constrained to be orthogonal, and shows that networks incorporating the layer outperform existing deterministic methods for certified defense against `2-norm-bounded adversaries, while scaling to larger architectures than previously investigated.

Sorting out Lipschitz function approximation

This work identifies a necessary property for such an architecture: each of the layers must preserve the gradient norm during backpropagation, and proposes to combine a gradient norm preserving activation function, GroupSort, with norm-constrained weight matrices that are universal Lipschitz function approximators.

Controllable Orthogonalization in Training DNNs

  • Lei HuangLi Liu L. Shao
  • Computer Science
    2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
  • 2020
This paper proposes a computationally efficient and numerically stable orthogonalization method using Newton's iteration (ONI), to learn a layer-wise Orthogonal weight matrix in DNNs and improves the performance of image classification networks by effectively controlling the orthogonality.

Intriguing properties of neural networks

It is found that there is no distinction between individual highlevel units and random linear combinations of high level units, according to various methods of unit analysis, and it is suggested that it is the space, rather than the individual units, that contains of the semantic information in the high layers of neural networks.

Regularisation of neural networks by enforcing Lipschitz continuity

The technique is used to formulate training a neural network with a bounded Lipschitz constant as a constrained optimisation problem that can be solved using projected stochastic gradient methods and shows that the performance of the resulting models exceeds that of models trained with other common regularisers.

Patches Are All You Need?

The ConvMixer is proposed, an extremely simple model that is similar in spirit to the ViT and the even-more-basic MLP-Mixer in that it operates directly on patches as input, separates the mixing of spatial and channel dimensions, and maintains equal size and resolution throughout the network.

Parseval Networks: Improving Robustness to Adversarial Examples

It is shown that Parseval networks match the state-of-the-art in terms of accuracy on CIFAR-10/100 and Street View House Numbers while being more robust than their vanilla counterpart against adversarial examples.