Git Re-Basin: Merging Models modulo Permutation Symmetries

@article{Ainsworth2022GitRM,
  title={Git Re-Basin: Merging Models modulo Permutation Symmetries},
  author={Samuel K. Ainsworth and Jonathan Hayase and Siddhartha S. Srinivasa},
  journal={ArXiv},
  year={2022},
  volume={abs/2209.04836}
}
The success of deep learning is due in large part to our ability to solve certain mas-sive non-convex optimization problems with relative ease. Though non-convex optimization is NP-hard, simple algorithms – often variants of stochastic gradient descent – exhibit surprising effectiveness in fitting large neural networks in practice. We argue that neural network loss landscapes often contain (nearly) a single basin after accounting for all possible permutation symmetries of hidden units a la… 

References

SHOWING 1-10 OF 101 REFERENCES

Linear Mode Connectivity and the Lottery Ticket Hypothesis

This work finds that standard vision models become stable to SGD noise in this way early in training, and uses this technique to study iterative magnitude pruning (IMP), the procedure used by work on the lottery ticket hypothesis to identify subnetworks that could have trained in isolation to full accuracy.

Learning Multiple Layers of Features from Tiny Images

It is shown how to train a multi-layer generative model that learns to extract meaningful features which resemble those found in the human visual cortex, using a novel parallelization algorithm to distribute the work among multiple machines connected on a network.

ImageNet: A large-scale hierarchical image database

A new database called “ImageNet” is introduced, a large-scale ontology of images built upon the backbone of the WordNet structure, much larger in scale and diversity and much more accurate than the current image datasets.

The Role of Permutation Invariance in Linear Mode Connectivity of Neural Networks

If the permutation invariance of neural networks is taken into account, SGD solutions will likely have no barrier in the linear interpolation between them, which has implications for lottery ticket hypothesis, distributed training and ensemble methods.

Loss Surface Simplexes for Mode Connecting Volumes and Fast Ensembling

This paper shows how to efficiently build simplicial complexes for fast ensembling, outperforming independently trained deep ensembles in accuracy, calibration, and robustness to dataset shift.

Optimizing Mode Connectivity via Neuron Alignment

This work proposes a more general framework to investigate the effect of symmetry on landscape connectivity by accounting for the weight permutations of the networks being connected by introducing an inexpensive heuristic referred to as neuron alignment.

Fast Differentiable Sorting and Ranking

This paper proposes the first differentiable sorting and ranking operators with O(n \log n) time and space complexity, and achieves this feat by constructing differentiable operators as projections onto the permutahedron, the convex hull of permutations, and using a reduction to isotonic optimization.

Model Fusion via Optimal Transport

This work presents a layer-wise model fusion algorithm for neural networks that utilizes optimal transport to (soft-) align neurons across the models before averaging their associated parameters, and shows that this can successfully yield "one-shot" knowledge transfer between neural networks trained on heterogeneous non-i.i.d. data.

On implementing 2D rectangular assignment algorithms

  • D. Crouse
  • Computer Science
    IEEE Transactions on Aerospace and Electronic Systems
  • 2016
This paper reviews research into solving the two-dimensional (2D) rectangular assignment problem and combines the best methods to implement a k-best 2D rectangular assignment algorithm with bounded

Deep Residual Learning for Image Recognition

This work presents a residual learning framework to ease the training of networks that are substantially deeper than those used previously, and provides comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth.
...