• Corpus ID: 3927382

Large Margin Deep Networks for Classification

  title={Large Margin Deep Networks for Classification},
  author={Gamaleldin F. Elsayed and Dilip Krishnan and Hossein Mobahi and Kevin Regan and Samy Bengio},
  booktitle={Neural Information Processing Systems},
We present a formulation of deep learning that aims at producing a large margin classifier. The notion of margin, minimum distance to a decision boundary, has served as the foundation of several theoretically profound and empirically successful results for both classification and regression tasks. However, most large margin algorithms are applicable only to shallow models with a preset feature representation; and conventional margin methods for neural networks only enforce margin at the output… 

Figures from this paper

Improved Sample Complexities for Deep Neural Networks and Robust Classification via an All-Layer Margin

This work presents a theoretically inspired training algorithm for increasing the all-layer margin and demonstrates that the algorithm improves test performance over strong baselines in practice and obtains tighter generalization bounds for neural nets which depend on Jacobian and hidden layer norms.

Improved Sample Complexities for Deep Networks and Robust Classification via an All-Layer Margin

This work analyzes a new notion of margin, which it is revealed has a clear and direct relationship with generalization for deep models, and presents a theoretically inspired training algorithm for increasing the all-layer margin.

Deep Large-Margin Rank Loss for Multi-Label Image Classification

Experimental results show that the deep large-margin ranking function improves the robustness of the model in multi-label image classification tasks while enhancing the model’s anti-noise performance.

1-to-N Large Margin Classifier

This work presents a novel formulation that aims to produce generalization and noise label robustness not only by imposing a margin at the top of the neural network, but also by using the entire structure of the mini-batch data.

Predicting the Generalization Gap in Deep Networks with Margin Distributions

This paper proposes a measure based on the concept of margin distribution, which are the distances of training points to the decision boundary, and finds that it is necessary to use margin distributions at multiple layers of a deep network.

Margin-Based Regularization and Selective Sampling in Deep Neural Networks

This work derives a new margin-based regularization formulation, termed multi-margin regularization (MMR), for deep neural networks (DNNs), and demonstrates accelerated training of DNNs by selecting samples according to a minimal margin score (MMS).

Hold me tight! Influence of discriminative features on deep network boundaries

This work rigorously confirms that neural networks exhibit a high invariance to non-discriminative features, and shows that the decision boundaries of a DNN can only exist as long as the classifier is trained with some features that hold them together.

Improving Adversarial Robustness of CNNs via Maximum Margin

It is proved that the SVM auxiliary classifier can constrain the high-layer feature map of the original network to make its margin larger, thereby improving the inter-class separability and intra-class compactness of the network.

Adversarial Margin Maximization Networks

This paper proposes adversarial margin maximization (AMM), a learning-based regularization which exploits an adversarial perturbation as a proxy and encourages a large margin in the input space, just like the support vector machines.

Recent Advances in Large Margin Learning

A survey of recent advances in large margin training and its theoretical foundations for (nonlinear) deep neural networks (DNNs) that are probably the most prominent machine learning models for large-scale data in the community over the past decade.



Margin maximization for robust classification using deep learning

This work introduces novel margin maximization objective for deep neural networks, and theoretically shows that the proposed objective is equivalent to the robust optimization problem for a neural network.

Large Margin Deep Neural Networks: Theory and Algorithms

A new margin bound for DNN is derived, in which the expected0-1 error of a DNN model is upper bounded by its empirical margin plus a Rademacher Average based capacity term, which is consistent with the empirical behaviors of DNN models.

Robust Large Margin Deep Neural Networks

The analysis leads to the conclusion that a bounded spectral norm of the network's Jacobian matrix in the neighbourhood of the training samples is crucial for a deep neural network of arbitrary depth and width to generalize well.

Large-Margin Softmax Loss for Convolutional Neural Networks

A generalized large-margin softmax (L-Softmax) loss which explicitly encourages intra-class compactness and inter-class separability between learned features and which not only can adjust the desired margin but also can avoid overfitting is proposed.

Soft-Margin Softmax for Deep Classification

A novel soft-margin softmax (SM-Softmax) loss to improve the discriminative power of features that can not only adjust the desired continuous soft margin but also be easily optimized by the typical stochastic gradient descent (SGD).

Parseval Networks: Improving Robustness to Adversarial Examples

It is shown that Parseval networks match the state-of-the-art in terms of accuracy on CIFAR-10/100 and Street View House Numbers while being more robust than their vanilla counterpart against adversarial examples.

Universal Adversarial Perturbations

The surprising existence of universal perturbations reveals important geometric correlations among the high-dimensional decision boundary of classifiers and outlines potential security breaches with the existence of single directions in the input space that adversaries can possibly exploit to break a classifier on most natural images.

The Implicit Bias of Gradient Descent on Separable Data

We examine gradient descent on unregularized logistic regression problems, with homogeneous linear predictors on linearly separable datasets. We show the predictor converges to the direction of the

Wide Residual Networks

This paper conducts a detailed experimental study on the architecture of ResNet blocks and proposes a novel architecture where the depth and width of residual networks are decreased and the resulting network structures are called wide residual networks (WRNs), which are far superior over their commonly used thin and very deep counterparts.

Towards Deep Learning Models Resistant to Adversarial Attacks

This work studies the adversarial robustness of neural networks through the lens of robust optimization, and suggests the notion of security against a first-order adversary as a natural and broad security guarantee.