# Noether: The More Things Change, the More Stay the Same

@article{Gluch2021NoetherTM, title={Noether: The More Things Change, the More Stay the Same}, author={Grzegorz Gluch and R{\"u}diger L. Urbanke}, journal={ArXiv}, year={2021}, volume={abs/2104.05508} }

Symmetries have proven to be important ingredients in the analysis of neural networks. So far their use has mostly been implicit or seemingly coincidental. We undertake a systematic study of the role that symmetry plays. In particular, we clarify how symmetry interacts with the learning algorithm. The key ingredient in our study is played by Noether’s celebrated theorem which, informally speaking, states that symmetry leads to conserved quantities (e.g., conservation of energy or conservation… Expand

#### Figures and Tables from this paper

#### 3 Citations

Geometric Deep Learning and Equivariant Neural Networks

- Computer Science, Physics
- ArXiv
- 2021

The mathematical foundations of geometric deep learning is surveyed, focusing on group equivariant and gaugeEquivariant neural networks and the use of Fourier analysis involving Wigner matrices, spherical harmonics and Clebsch–Gordan coefficients for G = SO(3), illustrating the power of representation theory for deep learning. Expand

Geometry of the Loss Landscape in Overparameterized Neural Networks: Symmetries and Invariances

- Computer Science
- ICML
- 2021

It is shown that adding one extra neuron to each layer is sufficient to connect all these previously discrete minima into a single manifold and provide new insights into the minimization of the non-convex loss function of overparameterized neural networks. Expand

Rethinking the Variational Interpretation of Nesterov's Accelerated Method

- Mathematics
- 2021

The continuous-time model of Nesterov’s momentum provides a thought-provoking perspective for understanding the nature of the acceleration phenomenon in convex optimization. One of the main ideas in… Expand

#### References

SHOWING 1-10 OF 56 REFERENCES

Algorithmic Regularization in Learning Deep Homogeneous Models: Layers are Automatically Balanced

- Computer Science, Mathematics
- NeurIPS
- 2018

It is rigorously proved that gradient flow effectively enforces the differences between squared norms across different layers to remain invariant without any explicit regularization, which implies that if the weights are initially small, gradient flow automatically balances the magnitudes of all layers. Expand

Learning and Generalization in Overparameterized Neural Networks, Going Beyond Two Layers

- Computer Science, Mathematics
- NeurIPS
- 2019

It is proved that overparameterized neural networks can learn some notable concept classes, including two and three-layer networks with fewer parameters and smooth activations, and SGD (stochastic gradient descent) or its variants in polynomial time using polynomially many samples. Expand

On the Optimization of Deep Networks: Implicit Acceleration by Overparameterization

- Computer Science, Mathematics
- ICML
- 2018

This paper suggests that, sometimes, increasing depth can speed up optimization and proves that it is mathematically impossible to obtain the acceleration effect of overparametrization via gradients of any regularizer. Expand

Spectrally-normalized margin bounds for neural networks

- Computer Science, Mathematics
- NIPS
- 2017

This bound is empirically investigated for a standard AlexNet network trained with SGD on the mnist and cifar10 datasets, with both original and random labels; the bound, the Lipschitz constants, and the excess risks are all in direct correlation, suggesting both that SGD selects predictors whose complexity scales with the difficulty of the learning task, and that the presented bound is sensitive to this complexity. Expand

Quantifying the Benefit of Using Differentiable Learning over Tangent Kernels

- Computer Science, Mathematics
- ICML
- 2021

It is shown that under certain conditions, gradient descent achieves small error only if a related tangent kernel method achieves a non-trivial advantage over random guessing, though this advantage might be very small even when gradient descent can achieve arbitrarily high accuracy. Expand

Backward Feature Correction: How Deep Learning Performs Deep Learning

- Computer Science, Mathematics
- ArXiv
- 2020

This paper formally analyzes how multi-layer neural networks can perform hierarchical learning efficiently and automatically by applying SGD and establishes a principle called "backward feature correction", where training higher layers in the network can improve the features of lower level ones. Expand

Enhanced convolutional neural tangent kernels, 2020

- URL https://openreview.net/forum?id=BkgNqkHFPr
- 2020

Learning Parities with Neural Networks

- Computer Science, Mathematics
- NeurIPS
- 2020

It is shown that under certain distributions, sparse parities are learnable via gradient decent on depth-two network, on the other hand, under the same distributions, these parities cannot be learned efficiently by linear methods. Expand

Learning parities with neural networks. In Advances in Neural Information Processing Systems

- Annual Conference on Neural Information Processing Systems
- 2020

Optimization and Generalization of Shallow Neural Networks with Quadratic Activation Functions

- Computer Science, Mathematics
- NeurIPS
- 2020

We study the dynamics of optimization and the generalization properties of one-hidden layer neural networks with quadratic activation function in the over-parametrized regime where the layer width… Expand