• Corpus ID: 235359196

Relative stability toward diffeomorphisms indicates performance in deep nets

  title={Relative stability toward diffeomorphisms indicates performance in deep nets},
  author={Leonardo Petrini and Alessandro Favero and Mario Geiger and Matthieu Wyart},
Understanding why deep nets can classify data in large dimensions remains a challenge. It has been proposed that they do so by becoming stable to diffeomorphisms, yet existing empirical measurements support that it is often not the case. We revisit this question by defining a maximum-entropy distribution on diffeomorphisms, that allows to study typical diffeomorphisms of a given norm. We confirm that stability toward diffeomorphisms does not strongly correlate to performance on benchmark data… 
Learning sparse features can lead to overfitting in neural networks
It is shown that feature learning can perform worse than lazy training (via random feature kernel or the NTK) as the former can lead to a sparser neural representation, and it is empirically shown that learning features can indeed lead to sparse and thereby less smooth representations of the image predictors.
How Wide Convolutional Neural Networks Learn Hierarchical Tasks
It is shown that the spectrum of the corresponding kernel and its asymptotics inherit the hierarchical structure of the network, which implies that despite their hierarchical structure, the functions generated by deep CNNs are too rich to be efficiently learnable in high dimension.
Synergy and Symmetry in Deep Learning: Interactions between the Data, Model, and Inference Algorithm
This paper analyzes the triplet ( D, M, I ) as an integrated system and identifies important synergies that help mitigate the curse of dimensionality.
Deeper Insights into ViTs Robustness towards Common Corruptions
This paper investigates how CNN-like architectural designs and CNN-based data augmentation strategies impact on ViTs’ robustness towards common corruptions through an extensive and rigorous benchmarking, and introduces a novel conditional method enabling input-varied augmentations from two angles.
Data augmentation with mixtures of max-entropy transformations for filling-level classification
This work addresses the problem of distribution shifts in testtime data with a principled data augmentation scheme that can replace current approaches that use transfer learning or can be used in combination with transfer learning to improve its performance.
Measuring dissimilarity with diffeomorphism invariance
This work introduces DID, a pairwise dissimilarity measure applicable to a wide range of data spaces, which leverages the data’s internal structure to be invariant to diffeomorphisms and proves that DID enjoys properties which make it relevant for theoretical study and practical use.
PRIME: A Few Primitives Can Boost Robustness to Common Corruptions
This work proposes PRIME, a general data augmentation scheme that consists of simple families of max-entropy image transformations that outperforms the prior art for corruption robustness, while its simplicity and plug-and-play nature enables it to be combined with other methods to further boost their robustness.
On the Sample Complexity of Learning under Invariance and Geometric Stability
This work provides non-parametric rates of convergence for kernel methods, and shows improvements in sample complexity by a factor equal to the size of the group when using an invariant kernel over the group, compared to the corresponding non-invariant kernel.


Group Invariance, Stability to Deformations, and Complexity of Deep Convolutional Representations
This paper considers deep convolutional representations of signals; it studies their invariance to translations and to more general groups of transformations, their stability to the action of diffeomorphisms, and their ability to preserve signal information.
Geometric compression of invariant manifolds in neural networks
It is shown that compression shapes the neural tangent kernel (NTK) evolution in time, so that its top eigenvectors become more informative and display a larger projection on the labels and puts forward kernel principal component analysis on the evolving NTK as a useful diagnostic of compression in deep networks.
Manitest: Are classifiers really invariant?
The Manitest method is proposed, built on the efficient Fast Marching algorithm to compute the invariance of classifiers, which quantifies in particular the importance of data augmentation for learning invariance from data, and the increased invariances of convolutional neural networks with depth.
Pooling is neither necessary nor sufficient for appropriate deformation stability in CNNs
These findings provide new insights into the role of interleaved pooling and deformation invariance in CNNs, and demonstrate the importance of rigorous empirical testing of even the most basic assumptions about the working of neural networks.
How isotropic kernels perform on simple invariants
It is found that β ∼ 1/d independently of d∥ , supporting previous findings that the presence of invariants does not resolve the curse of dimensionality for kernel regression, and improves classical bounds obtainable from Rademacher complexity.
Why do deep convolutional networks generalize so poorly to small image transformations?
The results indicate that the problem of insuring invariance to small image transformations in neural networks while preserving high accuracy remains unsolved.
Landscape and training regimes in deep learning
Geometric Robustness of Deep Networks: Analysis and Improvement
This work proposes ManiFool as a simple yet scalable algorithm to measure the invariance of deep networks and builds on it to propose a new adversarial training scheme and show its effectiveness on improving the invariances properties of deep neural networks.
Building high-level features using large scale unsupervised learning
Contrary to what appears to be a widely-held intuition, the experimental results reveal that it is possible to train a face detector without having to label images as containing a face or not.
Implicit Bias of Gradient Descent for Wide Two-layer Neural Networks Trained with the Logistic Loss
It is shown that the limits of the gradient flow on exponentially tailed losses can be fully characterized as a max-margin classifier in a certain non-Hilbertian space of functions.