• Corpus ID: 209324341

Linear Mode Connectivity and the Lottery Ticket Hypothesis

@article{Frankle2020LinearMC,
  title={Linear Mode Connectivity and the Lottery Ticket Hypothesis},
  author={Jonathan Frankle and Gintare Karolina Dziugaite and Daniel M. Roy and Michael Carbin},
  journal={ArXiv},
  year={2020},
  volume={abs/1912.05671}
}
We study whether a neural network optimizes to the same, linearly connected minimum under different samples of SGD noise (e.g., random data order and augmentation). We find that standard vision models become stable to SGD noise in this way early in training. From then on, the outcome of optimization is determined to a linearly connected region. We use this technique to study iterative magnitude pruning (IMP), the procedure used by work on the lottery ticket hypothesis to identify subnetworks… 

Figures and Tables from this paper

Interspace Pruning: Using Adaptive Filter Representations to Improve Training of Sparse CNNs
TLDR
Interspace pruning (IP) is introduced, a general tool to improve existing pruning methods that greatly exceeds SP with equal runtime and parameter costs and is shown that advances of IP are due to improved trainability and superior generalization ability.
Learning Neural Network Subspaces
TLDR
This work uses the subspace midpoint to boost accuracy, calibration, and robustness to label noise, outperforming Stochastic Weight Averaging and approaching the ensemble performance of independently trained networks without the training cost.
Linear Mode Connectivity in Multitask and Continual Learning
TLDR
It is empirically found that different minima of the same task are typically connected by very simple curves of low error, and this finding is exploited to propose an effective algorithm that constrains the sequentially learned minima to behave as the multitask solution.
Quarantine: Sparsity Can Uncover the Trojan Attack Trigger for Free
TLDR
This work proposes a novel Trojan network detection regime: first locating a “winning Trojan lottery ticket” which preserves nearly full Trojan information yet only chance-level performance on clean inputs; then recovering the trigger embedded in this already isolated subnetwork.
Analyzing Lottery Ticket Hypothesis from PAC-Bayesian Theory Perspective
TLDR
It is shown that the PAC- Bayesian theory can provide an explicit understanding of the relationship between LTH and generalization behavior and revisit existing algorithms for winning tickets from a PAC-Bayesian perspective and provide new insights into these methods.
Rare Gems: Finding Lottery Tickets at Initialization
TLDR
G EM -M INER is proposed, which proposes lottery tickets at initialization that beat current baselines that train to better accuracy compared to simple baselines, and does so up to 19 × faster.
An Operator Theoretic Perspective on Pruning Deep Neural Networks
TLDR
It is shown that Koopman operator theory-based algorithms can be equivalent to magnitude and gradient based pruning, unifying these seemingly disparate methods, and that they can be used to shed light on magnitude pruning’s performance during early training.
An Operator Theoretic View on Pruning Deep Neural Networks
TLDR
It is shown that Koopman operator theory algorithms can be equivalent to magnitude and gradient based pruning, unifying these seemingly disparate methods, and found that they can be used to shed light on magnitude pruning’s performance during the early part of training.
When to Prune? A Policy towards Early Structural Pruning
TLDR
This work introduces an Early Pruning Indicator (EPI) that relies on sub-network architectural similarity and quickly triggers pruning when the sub- network’s architecture stabilizes, and offers a new efficiency-accuracy boundary for network pruning during training.
The Role of Permutation Invariance in Linear Mode Connectivity of Neural Networks
TLDR
If the permutation invariance of neural networks is taken into account, SGD solutions will likely have no barrier in the linear interpolation between them, which has implications for lottery ticket hypothesis, distributed training and ensemble methods.
...
...

References

SHOWING 1-10 OF 40 REFERENCES
The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks
TLDR
This work finds that dense, randomly-initialized, feed-forward networks contain subnetworks ("winning tickets") that - when trained in isolation - reach test accuracy comparable to the original network in a similar number of iterations, and articulate the "lottery ticket hypothesis".
The State of Sparsity in Deep Neural Networks
TLDR
It is shown that unstructured sparse architectures learned through pruning cannot be trained from scratch to the same test set performance as a model trained with joint sparsification and optimization, and the need for large-scale benchmarks in the field of model compression is highlighted.
Rethinking the Value of Network Pruning
TLDR
It is found that with optimal learning rate, the "winning ticket" initialization as used in Frankle & Carbin (2019) does not bring improvement over random initialization, and the need for more careful baseline evaluations in future research on structured pruning methods is suggested.
Learning both Weights and Connections for Efficient Neural Network
TLDR
A method to reduce the storage and computation required by neural networks by an order of magnitude without affecting their accuracy by learning only the important connections, and prunes redundant connections using a three-step method.
The Early Phase of Neural Network Training
TLDR
It is found that deep networks are not robust to reinitializing with random weights while maintaining signs, and that weight distributions are highly non-independent even after only a few hundred iterations.
Picking Winning Tickets Before Training by Preserving Gradient Flow
TLDR
This work argues that efficient training requires preserving the gradient flow through the network, and proposes a simple but effective pruning criterion called Gradient Signal Preservation (GraSP), which achieves significantly better performance than the baseline at extreme sparsity levels.
Winning the Lottery with Continuous Sparsification
TLDR
Continuous Sparsification is proposed, a new algorithm to search for winning tickets which continuously removes parameters from a network during training, and learns the sub-network's structure with gradient-based methods instead of relying on pruning strategies.
What’s Hidden in a Randomly Weighted Neural Network?
TLDR
It is empirically show that as randomly weighted neural networks with fixed weights grow wider and deeper, an ``untrained subnetwork" approaches a network with learned weights in accuracy.
...
...