• Corpus ID: 239049694

Towards strong pruning for lottery tickets with non-zero biases

@article{Fischer2021TowardsSP,
  title={Towards strong pruning for lottery tickets with non-zero biases},
  author={Jonas Fischer and Rebekka Burkholz},
  journal={ArXiv},
  year={2021},
  volume={abs/2110.11150}
}
The strong lottery ticket hypothesis holds the promise that pruning randomly initialized deep neural networks could offer a computationally efficient alternative to deep learning with stochastic gradient descent. Common parameter initialization schemes and existence proofs, however, are focused on networks with zero biases, thus foregoing the potential universal approximation property of pruning. To fill this gap, we extend multiple initialization schemes and existence proofs to non-zero biases… 

Figures from this paper

On the Existence of Universal Lottery Tickets
TLDR
This work theoretically proves that not only do such universal tickets exist but they also do not require further training, and introduces a couple of technical innovations related to pruning for strong lottery tickets, including extensions of subset sum results and a strategy to leverage higher amounts of depth.
Plant 'n' Seek: Can You Find the Winning Ticket?
TLDR
This work derives a framework to plant and hide target architectures within large randomly initialized neural networks and finds that current limitations of pruning algorithms to identify extremely sparse tickets are likely of algorithmic rather than fundamental nature.

References

SHOWING 1-10 OF 26 REFERENCES
Proving the Lottery Ticket Hypothesis: Pruning is All You Need
TLDR
An even stronger hypothesis is proved, showing that for every bounded distribution and every target network with bounded weights, a sufficiently over-parameterized neural network with random weights contains a subnetwork with roughly the same accuracy as the target network, without any further training.
The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks
TLDR
This work finds that dense, randomly-initialized, feed-forward networks contain subnetworks ("winning tickets") that - when trained in isolation - reach test accuracy comparable to the original network in a similar number of iterations, and articulate the "lottery ticket hypothesis".
PICKING WINNING TICKETS BEFORE TRAINING
  • 2019
Overparameterization has been shown to benefit both the optimization and generalization of neural networks, but large networks are resource hungry at both training and test time. Network pruning can
Initialization of ReLUs for Dynamical Isometry
TLDR
The joint signal output distribution exactly is derived exactly, without mean field assumptions, for fully-connected networks with Gaussian weights and biases, and deviations from the mean field results are analyzed.
The Shattered Gradients Problem: If resnets are the answer, then what is the question?
TLDR
It is shown that the correlation between gradients in standard feedforward networks decays exponentially with depth resulting in gradients that resemble white noise whereas, in contrast, thegradients in architectures with skip-connections are far more resistant to shattering, decaying sublinearly.
Robust Pruning at Initialization
TLDR
A comprehensive theoretical analysis of Magnitude and Gradient based pruning at initialization and training of sparse architectures is provided and novel principled approaches are proposed which are validated experimentally on a variety of NN architectures.
Pruning Convolutional Neural Networks for Resource Efficient Inference
TLDR
It is shown that pruning can lead to more than 10x theoretical (5x practical) reduction in adapted 3D-convolutional filters with a small drop in accuracy in a recurrent gesture classifier.
SNIP: Single-shot Network Pruning based on Connection Sensitivity
TLDR
This work presents a new approach that prunes a given network once at initialization prior to training, and introduces a saliency criterion based on connection sensitivity that identifies structurally important connections in the network for the given task.
The Emergence of Spectral Universality in Deep Networks
TLDR
This work uses powerful tools from free probability theory to provide a detailed analytic understanding of how a deep network's Jacobian spectrum depends on various hyperparameters including the nonlinearity, the weight and bias distributions, and the depth.
A Signal Propagation Perspective for Pruning Neural Networks at Initialization
TLDR
By noting connection sensitivity as a form of gradient, this work formally characterize initialization conditions to ensure reliable connection sensitivity measurements, which in turn yields effective pruning results and modifications to the existing pruning at initialization method lead to improved results on all tested network models for image classification tasks.
...
1
2
3
...