Symmetries, flat minima, and the conserved quantities of gradient flow

  title={Symmetries, flat minima, and the conserved quantities of gradient flow},
  author={Bo Zhao and Iordan Ganev and Robin Walters and Rose Yu and Nima Dehmamy},
Empirical studies of the loss landscape of deep networks have revealed that many local minima are connected through low-loss valleys. Yet, little is known about the theoretical origin of such valleys. We present a general framework for finding continuous symmetries in the parameter space, which carve out low-loss valleys. Our framework uses equivariances of the activation functions and can be applied to different layer architectures. To generalize this framework to nonlinear neural networks, we… 



Understanding the Dynamics of Gradient Flow in Overparameterized Linear models

This work provides a detailed analysis of the dynamics of the gradient flow in overparameterized two-layer linear models and establishes interesting mathematical connections between matrix factorization problems and differential equations of the Riccati type.

Neural Mechanics: Symmetry and Broken Conservation Laws in Deep Learning Dynamics

By exploiting symmetry, this work demonstrates that it can analytically describe the learning dynamics of various parameter combinations at finite learning rates and batch sizes for state of the art architectures trained on any dataset.

Empirical Analysis of the Hessian of Over-Parametrized Neural Networks

A case that links the two observations: small and large batch gradient descent appear to converge to different basins of attraction but are in fact connected through their flat region and so belong to the same basin.

On Accelerated Methods in Optimization

It is shown that the family of Lagrangians is closed under time dilation (an orbit under the action of speeding up time), which demonstrates the universality of this Lagrangian view of acceleration in optimization.

Invariante variationsprobleme

  • Nachrichten von der Gesellschaft der Wissenschaften zu Göttingen, Mathematisch-Physikalische Klasse,
  • 1918

Universal approximation and model compression for radial neural networks

We introduce a class of fully-connected neural networks whose activation functions, rather than being pointwise, rescale feature vectors by a function depending only on their norm. We call such

Rethinking the limiting dynamics of SGD: modified loss, phase space oscillations, and anomalous diffusion

This work finds empirically that long after performance has converged, networks continue to move through parameter space by a process of anomalous diffusion in which distance travelled grows as a power law in the number of gradient updates with a nontrivial exponent.

Noether's Learning Dynamics: Role of Symmetry Breaking in Neural Networks

A theoretical framework is developed to study the geometry of learning dynamics in neural networks, and a key mechanism of explicit symmetry breaking is revealed behind the efficiency and stability of modern neural networks.

Loss Surface Simplexes for Mode Connecting Volumes and Fast Ensembling

This paper shows how to efficiently build simplicial complexes for fast ensembling, outperforming independently trained deep ensembles in accuracy, calibration, and robustness to dataset shift.

Asymmetric Valleys: Beyond Sharp and Flat Local Minima

It is proved that for asymmetric valleys, a solution biased towards the flat side generalizes better than the exact minimizer, which provides a theoretical explanation for the intriguing phenomenon observed by Izmailov et al. (2018).