# Geometry of the Loss Landscape in Overparameterized Neural Networks: Symmetries and Invariances

@article{cSimcsek2021GeometryOT, title={Geometry of the Loss Landscape in Overparameterized Neural Networks: Symmetries and Invariances}, author={Berfin cSimcsek and Franccois Gaston Ged and Arthur Jacot and Francesco Spadaro and Cl{\'e}ment Hongler and Wulfram Gerstner and Johanni Brea}, journal={ArXiv}, year={2021}, volume={abs/2105.12221} }

We study how permutation symmetries in overparameterized multi-layer neural networks generate ‘symmetry-induced’ critical points. Assuming a network with L layers of minimal widths r∗ 1 , . . . , r ∗ L−1 reaches a zero-loss minimum at r∗ 1 ! · · · r∗ L−1! isolated points that are permutations of one another, we show that adding one extra neuron to each layer is sufficient to connect all these previously discrete minima into a single manifold. For a two-layer overparameterized network of width r…

## Figures from this paper

## 11 Citations

### Embedding Principle: a hierarchical structure of loss landscape of deep neural networks

- Computer ScienceJournal of Machine Learning
- 2022

It is shown that loss landscape of an NN contains all critical points of all the narrower NNs, and an irreversiblility property of any critical embedding that the number of negative/zero/positive eigenvalues of the Hessian matrix of a critical point may increase but never decrease as an NNs becomes wider through the embedding.

### Embedding Principle of Loss Landscape of Deep Neural Networks

- Computer ScienceNeurIPS
- 2021

This work proves an embedding principle that the loss landscape of a DNN “contains” all the critical points of all the narrower DNNs, and proposes a critical embedding such that any critical point of a narrower Dnn can be embedded to a critical point/affine subspace of the target DNN with higher degeneracy and preserving the DNN output function.

### Saddle-to-Saddle Dynamics in Deep Linear Networks: Small Initialization Training, Symmetry, and Sparsity

- Computer Science
- 2021

A Saddle-to-Saddle dynamics is conjecture: throughout training, gradient descent visits the neighborhoods of a sequence of saddles, each corresponding to linear maps of increasing rank, until reaching a sparse global minimum.

### The Role of Permutation Invariance in Linear Mode Connectivity of Neural Networks

- Mathematics, Computer ScienceICLR
- 2022

If the permutation invariance of neural networks is taken into account, SGD solutions will likely have no barrier in the linear interpolation between them, which has implications for lottery ticket hypothesis, distributed training and ensemble methods.

### Noether's Learning Dynamics: Role of Symmetry Breaking in Neural Networks

- Computer Science
- 2021

A theoretical framework is developed to study the geometry of learning dynamics in neural networks, and a key mechanism of explicit symmetry breaking is revealed behind the efficiency and stability of modern neural networks.

### Embedding Principle in Depth for the Loss Landscape Analysis of Deep Neural Networks

- Computer ScienceArXiv
- 2022

An embedding principle in depth is discovered that loss landscape of an NN “contains” all critical points of the loss landscapes for shallower NNs, which serves as a solid foundation for the further study about the role of depth for DNNs.

### Random initialisations performing above chance and how to find them

- Computer Science
- 2022

A simple but powerful algorithm is used to obtain direct empirical evidence that any two solutions found by SGD can be permuted such that the linear interpolation between their parameters forms a path without signiﬁcant increases in loss.

### Symmetry Teleportation for Accelerated Optimization

- Computer ScienceArXiv
- 2022

This work derives the loss-invariant group actions for test functions and multi-layer neural networks, and proves a necessary condition of when teleportation improves convergence rate, and shows that the algorithm is closely related to second order methods.

### Neural networks embrace learned diversity

- EducationArXiv
- 2022

Diversity conveys advantages in nature, yet homogeneous neurons typically comprise the layers of artiﬁcial neural networks. Here we construct neural networks from neurons that learn their own…

### A Topological Centrality Measure for Directed Networks

- Computer ScienceArXiv
- 2022

A new metric for computing centrality in directed weighted networks, namely the quasi-centrality measure is introduced and a method that gives a hierarchical representation of the topological influences of nodes in a directed network is introduced.

## References

SHOWING 1-10 OF 38 REFERENCES

### Weight-space symmetry in deep networks gives rise to permutation saddles, connected by equal-loss valleys across the loss landscape

- Computer ScienceArXiv
- 2019

The geometric approach yields a lower bound on the number of critical points generated by weight-space symmetries and provides a simple intuitive link between previous mathematical results and numerical observations.

### Topology and Geometry of Half-Rectified Network Optimization

- Computer ScienceICLR
- 2017

The main theoretical contribution is to prove that half-rectified single layer networks are asymptotically connected, and an algorithm is introduced to efficiently estimate the regularity of such sets on large-scale networks.

### The critical locus of overparameterized neural networks

- Computer ScienceArXiv
- 2020

The results in this paper provide a starting point to a more quantitative understanding of the properties of various components of the critical locus of the loss function $L$ of overparameterized feedforward neural networks.

### Empirical Analysis of the Hessian of Over-Parametrized Neural Networks

- Computer ScienceICLR
- 2018

A case that links the two observations: small and large batch gradient descent appear to converge to different basins of attraction but are in fact connected through their flat region and so belong to the same basin.

### Large Scale Structure of Neural Network Loss Landscapes

- Computer ScienceNeurIPS
- 2019

This work proposes and experimentally verify a unified phenomenological model of the loss landscape as a set of high dimensional wedges that together form a large-scale, inter-connected structure and towards which optimization is drawn.

### Gradient Descent Provably Optimizes Over-parameterized Neural Networks

- Computer ScienceICLR
- 2019

Over-parameterization and random initialization jointly restrict every weight vector to be close to its initialization for all iterations, which allows a strong convexity-like property to show that gradient descent converges at a global linear rate to the global optimum.

### Wide Neural Networks of Any Depth Evolve as Linear Models Under Gradient Descent

- Computer ScienceNeurIPS
- 2019

This work shows that for wide NNs the learning dynamics simplify considerably and that, in the infinite width limit, they are governed by a linear model obtained from the first-order Taylor expansion of the network around its initial parameters.

### Fine-Grained Analysis of Optimization and Generalization for Overparameterized Two-Layer Neural Networks

- Computer ScienceICML
- 2019

This paper analyzes training and generalization for a simple 2-layer ReLU net with random initialization, and provides the following improvements over recent works: a tighter characterization of training speed, an explanation for why training a neuralNet with random labels leads to slower training, and a data-dependent complexity measure.

### Deep learning versus kernel learning: an empirical study of loss landscape geometry and the time evolution of the Neural Tangent Kernel

- Computer ScienceNeurIPS
- 2020

A large-scale phenomenological analysis of training reveals a striking correlation between a diverse set of metrics over training time, governed by a rapid chaotic to stable transition in the first few epochs, that together poses challenges and opportunities for the development of more accurate theories of deep learning.

### Semi-flat minima and saddle points by embedding neural networks to overparameterization

- Computer ScienceNeurIPS
- 2019

The results show that the networks with smooth and ReLU activation have different partially flat landscapes around the embedded point, and relate these results to a difference of their generalization abilities in overparameterized realization.