Corpus ID: 219687773

# Optimal Lottery Tickets via SubsetSum: Logarithmic Over-Parameterization is Sufficient

@article{Pensia2020OptimalLT,
title={Optimal Lottery Tickets via SubsetSum: Logarithmic Over-Parameterization is Sufficient},
author={Ankit Pensia and Shashank Rajput and Alliot Nagle and Harit Vishwakarma and Dimitris Papailiopoulos},
journal={ArXiv},
year={2020},
volume={abs/2006.07990}
}
The strong {\it lottery ticket hypothesis} (LTH) postulates that one can approximate any target neural network by only pruning the weights of a sufficiently over-parameterized random network. A recent work by Malach et al.~\cite{MalachEtAl20} establishes the first theoretical analysis for the strong LTH: one can provably approximate a neural network of width $d$ and depth $l$, by pruning a random one that is a factor $O(d^4l^2)$ wider and twice as deep. This polynomial over-parameterization… Expand

#### Figures and Topics from this paper

Greedy Optimization Provably Wins the Lottery: Logarithmic Number of Winning Tickets is Enough
• Computer Science, Mathematics
• NeurIPS
• 2020
A greedy optimization based pruning method that has the guarantee that the discrepancy between the pruned network and the original network decays with exponentially fast rate w.r.t. the size of the pruning network, under weak assumptions that apply for most practical settings. Expand
Provable Benefits of Overparameterization in Model Compression: From Double Descent to Pruning Neural Networks
• Computer Science, Mathematics
• AAAI
• 2021
The theory presented addresses the following core question: "should one train a small model from the beginning, or first train a large model and then prune?", and analytically identifies regimes in which, even if the location of the most informative features is known, the authors are better off fitting a large models and thenPruning rather than simply training with the known informative features. Expand
GANs Can Play Lottery Tickets Too
• Computer Science
• ICLR
• 2021
Extensive experimental results demonstrate that the found subnetworks substantially outperform previous state-of-the-art GAN compression approaches in both image generation and image-to-image translation GANs and show the powerful transferability of these subnetwork to unseen tasks. Expand
A Probabilistic Approach to Neural Network Pruning
• Computer Science
• ICML
• 2021
This work theoretically study the performance of two pruning techniques (random and magnitudebased) on FCNs and CNNs and establishes that there exist pruned networks with expressive power within any specified bound from the target network. Expand
Pruning Randomly Initialized Neural Networks with Iterative Randomization
• Daiki Chijiwa, Tomohiro Inoue
• Computer Science, Mathematics
• ArXiv
• 2021
A novel framework to prune randomly initialized neural networks with iteratively randomizing weight values (IteRand) is introduced, which indicates that the randomizing operations are provably effective to reduce the required number of parameters. Expand
MixMo: Mixing Multiple Inputs for Multiple Outputs via Deep Subnetworks
• Computer Science
• ArXiv
• 2021
It is shown that binary mixing in features particularly with rectangular patches from CutMix enhances results by making subnetworks stronger and more diverse, which improves state of the art for image classification on CIFAR-100 and Tiny ImageNet datasets. Expand
A Gradient Flow Framework For Analyzing Network Pruning
• Computer Science, Mathematics
• ICLR
• 2021
A general gradient flow based framework is developed that unifies state-of-the-art importance measures through the norm of model parameters and establishes several results related to pruning models early-on in training, including magnitude-based pruning, which preserves first-order model evolution dynamics and is appropriate for pruning minimally trained models. Expand
A Winning Hand: Compressing Deep Networks Can Improve Out-Of-Distribution Robustness
• Computer Science
• ArXiv
• 2021
This work is able to create extremely compact CARDs that are dramatically more robust than their significantly larger and full-precision counterparts while matching (or beating) their test accuracy, simply by pruning and/or quantizing. Expand
Multi-Prize Lottery Ticket Hypothesis: Finding Accurate Binary Neural Networks by Pruning A Randomly Weighted Network
• Computer Science
• ICLR
• 2021
Empirical results indicate that as models grow deeper and wider, multi-prize tickets start to reach similar (and sometimes even higher) test accuracy compared to their significantly larger and full-precision counterparts that have been weight-trained. Expand
Playing Lottery Tickets with Vision and Language
This work uses UNITER, one of the best-performing V+L models, as the testbed, and conducts the first empirical study to assess whether trainable subnetworks also exist in pre-trained V+ L models. Expand

#### References

SHOWING 1-10 OF 42 REFERENCES
Deconstructing Lottery Tickets: Zeros, Signs, and the Supermask
• Computer Science, Mathematics
• NeurIPS
• 2019
This paper studies the three critical components of the Lottery Ticket algorithm, showing that each may be varied significantly without impacting the overall results, and shows why setting weights to zero is important, how signs are all you need to make the reinitialized network train, and why masks behaves like training. Expand
The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks
• Computer Science, Mathematics
• ICLR
• 2019
This work finds that dense, randomly-initialized, feed-forward networks contain subnetworks ("winning tickets") that - when trained in isolation - reach test accuracy comparable to the original network in a similar number of iterations, and articulate the "lottery ticket hypothesis". Expand
Proving the Lottery Ticket Hypothesis: Pruning is All You Need
• Computer Science, Mathematics
• ICML
• 2020
An even stronger hypothesis is proved, showing that for every bounded distribution and every target network with bounded weights, a sufficiently over-parameterized neural network with random weights contains a subnetwork with roughly the same accuracy as the target network, without any further training. Expand
Using Winning Lottery Tickets in Transfer Learning for Convolutional Neural Networks
• Computer Science
• 2019 International Joint Conference on Neural Networks (IJCNN)
• 2019
This paper lays the groundwork for a transfer learning method that reduces the original network to its essential connections and does not require freezing entire layers, and discusses how this method can be an alternative to transfer learning, with positive initial results. Expand
To prune, or not to prune: exploring the efficacy of pruning for model compression
• Computer Science, Mathematics
• ICLR
• 2018
Across a broad range of neural network architectures, large-sparse models are found to consistently outperform small-dense models and achieve up to 10x reduction in number of non-zero parameters with minimal loss in accuracy. Expand
Pruning from Scratch
This work finds that pre-training an over-parameterized model is not necessary for obtaining the target pruned structure, and empirically shows that more diverse pruned structures can be directly pruned from randomly initialized weights, including potential models with better performance. Expand
The Search for Sparse, Robust Neural Networks
• Computer Science, Mathematics
• ArXiv
• 2019
An extensive empirical evaluation and analysis testing the Lottery Ticket Hypothesis with adversarial training is performed and it is shown this approach enables us to find sparse, robust neural networks. Expand
What is the State of Neural Network Pruning?
• Computer Science, Mathematics
• MLSys
• 2020
Issues with current practices in pruning are identified, concrete remedies are suggested, and ShrinkBench, an open-source framework to facilitate standardized evaluations of pruning methods are introduced, to be used to compare various pruning techniques. Expand
Reducibility Among Combinatorial Problems
• R. Karp
• Computer Science
• 50 Years of Integer Programming
• 2010
Throughout the 1960s I worked on combinatorial optimization problems including logic circuit design with Paul Roth and assembly line balancing and the traveling salesman problem with Mike Held, which made me aware of the importance of distinction between polynomial-time and superpolynomial-time solvability. Expand
PROBABILISTIC ANALYSIS OF OPTIMUM PARTITIONING
• Mathematics
• 1986
Given a set of n items with real-valued sizes, the optimum partition problem asks how it can be partitioned into two subsets so that the absolute value of the difference of the sums of the sizes overExpand