Margin Preservation of Deep Neural Networks

  title={Margin Preservation of Deep Neural Networks},
  author={Jure Sokoli{\'c} and Raja Giryes and Guillermo Sapiro and Miguel R. D. Rodrigues},
The generalization error of deep neural networks via their classification margin is studied in this work, providing novel generalization error bounds that are independent of the network depth, thereby avoiding the common exponential depth-dependency which is unrealistic for current networks with hundreds of layers. We show that a large margin linear classifier operating at the output of a deep neural network induces a large classification margin at the input of the network, provided that the… 

Figures and Tables from this paper

Generalization Error in Deep Learning

This chapter provides an overview of the existing theory and bounds for the characterization of the generalization error of deep neural networks, combining both classical and more recent theoretical and empirical results.

Adversarial Noise Attacks of Deep Learning Architectures: Stability Analysis via Sparse-Modeled Signals

This paper analyzes the stability of state-of-the-art deep learning classification machines to adversarial perturbations, where it is assumed that the signals belong to the (possibly multilayer) sparse representation model.

An analysis of training and generalization errors in shallow and deep networks

On Symmetry and Initialization for Neural Networks

This work considers neural networks with one hidden layer and shows that when learning symmetric functions, one can choose initial conditions so that standard SGD training efficiently produces generalization guarantees.

Jacobian Regularization for Mitigating Universal Adversarial Perturbations

It is empirically verified that Jacobian regularization greatly increases model robustness to UAPs by up to four times whilst maintaining clean performance, which suggests that realistic and practical universal attacks can be reliably mitigated without sacrificing clean accuracy, which shows promise for the robustness of machine learning systems.

An Empirical Study on the Relation Between Network Interpretability and Adversarial Robustness

It is demonstrated that training the networks to have interpretable gradients improves their robustness to adversarial perturbations, and the results indicate that the interpretability of the model gradients is a crucial factor for adversarial robustness.

Study on the Lightweighting Strategy of Target Detection Model with Deep Learning

  • Junli Hu
  • Computer Science
    Advances in Multimedia
  • 2022
Experiments on the public data set AI-TOD show that the target detection lightweight model of deep learning has stronger detection ability and higher average detection accuracy than other algorithms, which proves the applicability and effectiveness of this algorithm.

HALO: Hardware-Aware Learning to Optimize

This work proposes hardware-aware learning to optimize (HALO), a practical meta optimizer dedicated to resource-efficient on-device adaptation and features the following highlights: faster adaptation speed, lower per-iteration complexity, and a stochastic structural sparsity regularizer being enforced.

LATIN 2020: Theoretical Informatics: 14th Latin American Symposium, São Paulo, Brazil, January 5-8, 2021, Proceedings

A PTAS for Steiner tree on map graphs is obtained, which builds on the result for planar edge weighted instances of Borradaile et al. and proves and uses a contraction decomposition theorem forPlanar node weighted instances.



Large Margin Deep Neural Networks: Theory and Algorithms

A new margin bound for DNN is derived, in which the expected0-1 error of a DNN model is upper bounded by its empirical margin plus a Rademacher Average based capacity term, which is consistent with the empirical behaviors of DNN models.

Deep Neural Networks with Random Gaussian Weights: A Universal Classification Strategy?

It is formally proved that these networks with random Gaussian weights perform a distance-preserving embedding of the data, with a special treatment for in-class and out-of-class data.

Deep Learning using Linear Support Vector Machines

The results using L2-SVMs show that by simply replacing softmax with linear SVMs gives significant gains on popular deep learning datasets MNIST, CIFAR-10, and the ICML 2013 Representation Learning Workshop's face expression recognition challenge.

Deep Residual Learning for Image Recognition

This work presents a residual learning framework to ease the training of networks that are substantially deeper than those used previously, and provides comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth.

Breaking the Curse of Dimensionality with Convex Neural Networks

  • F. Bach
  • Computer Science
    J. Mach. Learn. Res.
  • 2017
This work considers neural networks with a single hidden layer and non-decreasing homogeneous activa-tion functions like the rectified linear units and shows that they are adaptive to unknown underlying linear structures, such as the dependence on the projection of the input variables onto a low-dimensional subspace.

Contractive Rectifier Networks for Nonlinear Maximum Margin Classification

Experimental results demonstrate that the proposed contractive rectifier networks consistently outperform their conventional unconstrained rectifier network counterparts.

Discriminative Robust Transformation Learning

A framework for learning features that are robust to data variation, which is particularly important when only a limited number of training samples are available, is proposed, thereby providing theoretical justification for reductions in generalization error observed in experiments.

On the Number of Linear Regions of Deep Neural Networks

We study the complexity of functions computable by deep feedforward neural networks with piecewise linear activations in terms of the symmetries and the number of linear regions that they have. Deep

ImageNet classification with deep convolutional neural networks

A large, deep convolutional neural network was trained to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes and employed a recently developed regularization method called "dropout" that proved to be very effective.

Global Optimality in Tensor Factorization, Deep Learning, and Beyond

This framework derives sufficient conditions to guarantee that a local minimum of the non-convex optimization problem is a global minimum and shows that if the size of the factorized variables is large enough then from any initialization it is possible to find a global minimizer using a purely local descent algorithm.