• Publications
  • Influence
The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks
TLDR
This work finds that dense, randomly-initialized, feed-forward networks contain subnetworks ("winning tickets") that - when trained in isolation - reach test accuracy comparable to the original network in a similar number of iterations, and articulate the "lottery ticket hypothesis".
Linear Mode Connectivity and the Lottery Ticket Hypothesis
TLDR
This work finds that standard vision models become stable to SGD noise in this way early in training, and uses this technique to study iterative magnitude pruning (IMP), the procedure used by work on the lottery ticket hypothesis to identify subnetworks that could have trained in isolation to full accuracy.
What is the State of Neural Network Pruning?
TLDR
Issues with current practices in pruning are identified, concrete remedies are suggested, and ShrinkBench, an open-source framework to facilitate standardized evaluations of pruning methods are introduced, to be used to compare various pruning techniques.
Comparing Rewinding and Fine-tuning in Neural Network Pruning
TLDR
Fine-tuning is compared to alternative retraining techniques, and learning rate rewinding is proposed, forming the basis of a network-agnostic pruning algorithm that matches the accuracy and compression ratios of several more network-specific state-of-the-art techniques.
Stabilizing the Lottery Ticket Hypothesis
TLDR
This paper modifications IMP to search for subnetworks that could have been obtained by pruning early in training rather than at iteration 0, and studies subnetwork "stability," finding that - as accuracy improves in this fashion - IMP subnets train to parameters closer to those of the full network and do so with improved consistency in the face of gradient noise.
The Lottery Ticket Hypothesis for Pre-trained BERT Networks
TLDR
This work finds matching subnetworks at 40% to 90% sparsity in BERT models at (pre-trained) initialization, a deviation from prior NLP research where they emerge only after some amount of training.
The Lottery Ticket Hypothesis: Training Pruned Neural Networks
TLDR
The lottery ticket hypothesis and its connection to pruning are a step toward developing architectures, initializations, and training strategies that make it possible to solve the same problems with much smaller networks.
Pruning Neural Networks at Initialization: Why are We Missing the Mark?
TLDR
It is shown that, unlike pruning after training, accuracy is the same or higher when randomly shuffling which weights these methods prune within each layer or sampling new initial values, undermining the claimed justifications for these methods and suggesting broader challenges with the underlying pruning heuristics.
Example-directed synthesis: a type-theoretic interpretation
TLDR
It is demonstrated that examples can, in general, be interpreted as refinement types, and formalizing synthesis as proof search in a sequent calculus with intersection and union refinements that prove to be sound with respect to a conventional type system is put into practice.
The Early Phase of Neural Network Training
TLDR
It is found that deep networks are not robust to reinitializing with random weights while maintaining signs, and that weight distributions are highly non-independent even after only a few hundred iterations.
...
...