Corpus ID: 233210580

Noether: The More Things Change, the More Stay the Same

@article{Gluch2021NoetherTM,
  title={Noether: The More Things Change, the More Stay the Same},
  author={Grzegorz Gluch and R{\"u}diger L. Urbanke},
  journal={ArXiv},
  year={2021},
  volume={abs/2104.05508}
}
Symmetries have proven to be important ingredients in the analysis of neural networks. So far their use has mostly been implicit or seemingly coincidental. We undertake a systematic study of the role that symmetry plays. In particular, we clarify how symmetry interacts with the learning algorithm. The key ingredient in our study is played by Noether’s celebrated theorem which, informally speaking, states that symmetry leads to conserved quantities (e.g., conservation of energy or conservation… Expand

Figures and Tables from this paper

Geometric Deep Learning and Equivariant Neural Networks
TLDR
The mathematical foundations of geometric deep learning is surveyed, focusing on group equivariant and gaugeEquivariant neural networks and the use of Fourier analysis involving Wigner matrices, spherical harmonics and Clebsch–Gordan coefficients for G = SO(3), illustrating the power of representation theory for deep learning. Expand
Geometry of the Loss Landscape in Overparameterized Neural Networks: Symmetries and Invariances
TLDR
It is shown that adding one extra neuron to each layer is sufficient to connect all these previously discrete minima into a single manifold and provide new insights into the minimization of the non-convex loss function of overparameterized neural networks. Expand
Rethinking the Variational Interpretation of Nesterov's Accelerated Method
The continuous-time model of Nesterov’s momentum provides a thought-provoking perspective for understanding the nature of the acceleration phenomenon in convex optimization. One of the main ideas inExpand

References

SHOWING 1-10 OF 56 REFERENCES
Algorithmic Regularization in Learning Deep Homogeneous Models: Layers are Automatically Balanced
TLDR
It is rigorously proved that gradient flow effectively enforces the differences between squared norms across different layers to remain invariant without any explicit regularization, which implies that if the weights are initially small, gradient flow automatically balances the magnitudes of all layers. Expand
Learning and Generalization in Overparameterized Neural Networks, Going Beyond Two Layers
TLDR
It is proved that overparameterized neural networks can learn some notable concept classes, including two and three-layer networks with fewer parameters and smooth activations, and SGD (stochastic gradient descent) or its variants in polynomial time using polynomially many samples. Expand
On the Optimization of Deep Networks: Implicit Acceleration by Overparameterization
TLDR
This paper suggests that, sometimes, increasing depth can speed up optimization and proves that it is mathematically impossible to obtain the acceleration effect of overparametrization via gradients of any regularizer. Expand
Spectrally-normalized margin bounds for neural networks
TLDR
This bound is empirically investigated for a standard AlexNet network trained with SGD on the mnist and cifar10 datasets, with both original and random labels; the bound, the Lipschitz constants, and the excess risks are all in direct correlation, suggesting both that SGD selects predictors whose complexity scales with the difficulty of the learning task, and that the presented bound is sensitive to this complexity. Expand
Quantifying the Benefit of Using Differentiable Learning over Tangent Kernels
TLDR
It is shown that under certain conditions, gradient descent achieves small error only if a related tangent kernel method achieves a non-trivial advantage over random guessing, though this advantage might be very small even when gradient descent can achieve arbitrarily high accuracy. Expand
Backward Feature Correction: How Deep Learning Performs Deep Learning
TLDR
This paper formally analyzes how multi-layer neural networks can perform hierarchical learning efficiently and automatically by applying SGD and establishes a principle called "backward feature correction", where training higher layers in the network can improve the features of lower level ones. Expand
Enhanced convolutional neural tangent kernels, 2020
  • URL https://openreview.net/forum?id=BkgNqkHFPr
  • 2020
Learning Parities with Neural Networks
TLDR
It is shown that under certain distributions, sparse parities are learnable via gradient decent on depth-two network, on the other hand, under the same distributions, these parities cannot be learned efficiently by linear methods. Expand
Learning parities with neural networks. In Advances in Neural Information Processing Systems
  • Annual Conference on Neural Information Processing Systems
  • 2020
Optimization and Generalization of Shallow Neural Networks with Quadratic Activation Functions
We study the dynamics of optimization and the generalization properties of one-hidden layer neural networks with quadratic activation function in the over-parametrized regime where the layer widthExpand
...
1
2
3
4
5
...