Corpus ID: 221802286

Pruning Neural Networks at Initialization: Why are We Missing the Mark?

@article{Frankle2021PruningNN,
  title={Pruning Neural Networks at Initialization: Why are We Missing the Mark?},
  author={Jonathan Frankle and Gintare Karolina Dziugaite and Daniel M. Roy and Michael Carbin},
  journal={ArXiv},
  year={2021},
  volume={abs/2009.08576}
}
Recent work has explored the possibility of pruning neural networks at initialization. We assess proposals for doing so: SNIP (Lee et al., 2019), GraSP (Wang et al., 2020), SynFlow (Tanaka et al., 2020), and magnitude pruning. Although these methods surpass the trivial baseline of random pruning, they remain below the accuracy of magnitude pruning after training, and we endeavor to understand why. We show that, unlike pruning after training, accuracy is the same or higher when randomly… Expand
Emerging Paradigms of Neural Network Pruning
TLDR
A general pruning framework is proposed so that the emerging pruning paradigms can be accommodated well with the traditional one and the open questions as worthy future directions are summarized. Expand
Sanity-Checking Pruning Methods: Random Tickets can Win the Jackpot
TLDR
Experimental results show that the zero-shot random tickets outperform or attain a similar performance compared to existing "initial tickets", and a new method called "hybrid tickets", which achieves further improvement, is proposed. Expand
Gradient Flow in Sparse Neural Networks and How Lottery Tickets Win
TLDR
It is shown that sparse NNs have poor gradient flow at initialization and a modified initialization for unstructured connectivity is proposed, and it is found that DST methods significantly improve gradient flow during training over traditional sparse training methods. Expand
Successive Pruning for Model Compression via Rate Distortion Theory
TLDR
This work studies NN compression from an information-theoretic approach and shows that rate distortion theory suggests pruning to achieve the theoretical limits of Nn compression, and provides an end-to-end compression pipeline involving a novel pruning strategy. Expand
Lottery Ticket Implies Accuracy Degradation, Is It a Desirable Phenomenon?
TLDR
This work investigates the underlying condition and rationale behind the winning property, and finds that the underlying reason is largely attributed to the correlation between initialized weights and final-trained weights when the learning rate is not sufficiently large. Expand
Network Compression for Machine-Learnt Fluid Simulations
Multi-scale, multi-fidelity numerical simulations form the pillar of scientific applications related to numerically modeling fluids. However, simulating the fluid behavior characterized by theExpand
Towards Efficient Convolutional Network Models with Filter Distribution Templates
TLDR
A small set of templates consisting of easy to implement, intuitive and aggressive variations of the original pyramidal distribution of filters in VGG and ResNet architectures are introduced, showing that models produced by these templates, are more efficient in terms of fewer parameters and memory needs. Expand
Adapting by Pruning: A Case Study on BERT
TLDR
This work proposes a novel model adaptation paradigm, adapting by pruning, which prunes neural connections in the pre-trained model to optimise the performance on the target task; all remaining connections have their weights intact. Expand
Dense for the Price of Sparse: Improved Performance of Sparsely Initialized Networks via a Subspace Offset
TLDR
It is shown that standard training of networks built with these layers, and pruned at initialization, achieves state-of-the-art accuracy for extreme sparsities on a variety of benchmark network architectures and datasets. Expand
Do We Actually Need Dense Over-Parameterization? In-Time Over-Parameterization in Sparse Training
TLDR
A new perspective on training deep neural networks capable of state-of-the-art performance without the need for the expensive over-parameterization is introduced by proposing the concept of In-Time Over-Parameterization (ITOP) in sparse training. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 40 REFERENCES
What is the State of Neural Network Pruning?
TLDR
Issues with current practices in pruning are identified, concrete remedies are suggested, and ShrinkBench, an open-source framework to facilitate standardized evaluations of pruning methods are introduced, to be used to compare various pruning techniques. Expand
SNIP: Single-shot Network Pruning based on Connection Sensitivity
TLDR
This work presents a new approach that prunes a given network once at initialization prior to training, and introduces a saliency criterion based on connection sensitivity that identifies structurally important connections in the network for the given task. Expand
To prune, or not to prune: exploring the efficacy of pruning for model compression
TLDR
Across a broad range of neural network architectures, large-sparse models are found to consistently outperform small-dense models and achieve up to 10x reduction in number of non-zero parameters with minimal loss in accuracy. Expand
The Early Phase of Neural Network Training
TLDR
It is found that deep networks are not robust to reinitializing with random weights while maintaining signs, and that weight distributions are highly non-independent even after only a few hundred iterations. Expand
Comparing Rewinding and Fine-tuning in Neural Network Pruning
TLDR
Fine-tuning is compared to alternative retraining techniques, and learning rate rewinding is proposed, forming the basis of a network-agnostic pruning algorithm that matches the accuracy and compression ratios of several more network-specific state-of-the-art techniques. Expand
Learning both Weights and Connections for Efficient Neural Network
TLDR
A method to reduce the storage and computation required by neural networks by an order of magnitude without affecting their accuracy by learning only the important connections, and prunes redundant connections using a three-step method. Expand
The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks
TLDR
This work finds that dense, randomly-initialized, feed-forward networks contain subnetworks ("winning tickets") that - when trained in isolation - reach test accuracy comparable to the original network in a similar number of iterations, and articulate the "lottery ticket hypothesis". Expand
Pruning Filters for Efficient ConvNets
TLDR
This work presents an acceleration method for CNNs, where it is shown that even simple filter pruning techniques can reduce inference costs for VGG-16 and ResNet-110 by up to 38% on CIFAR10 while regaining close to the original accuracy by retraining the networks. Expand
The State of Sparsity in Deep Neural Networks
TLDR
It is shown that unstructured sparse architectures learned through pruning cannot be trained from scratch to the same test set performance as a model trained with joint sparsification and optimization, and the need for large-scale benchmarks in the field of model compression is highlighted. Expand
Fast Sparse ConvNets
TLDR
This work introduces a family of efficient sparse kernels for several hardware platforms, and shows that sparse versions of MobileNet v1 and Mobile net v2 architectures substantially outperform strong dense baselines on the efficiency-accuracy curve. Expand
...
1
2
3
4
...