• Corpus ID: 232417266

The Elastic Lottery Ticket Hypothesis

@inproceedings{Chen2021TheEL,
  title={The Elastic Lottery Ticket Hypothesis},
  author={Xiaohan Chen and Yu Cheng and Shuohang Wang and Zhe Gan and Jingjing Liu and Zhangyang Wang},
  booktitle={NeurIPS},
  year={2021}
}
Lottery Ticket Hypothesis (LTH) raises keen attention to identifying sparse trainable subnetworks, or winning tickets, which can be trained in isolation to achieve similar or even better performance compared to the full models. Despite many efforts being made, the most effective method to identify such winning tickets is still Iterative Magnitude-based Pruning (IMP), which is computationally expensive and has to be run thoroughly for every different network. A natural question that comes in is… 
Universality of Deep Neural Network Lottery Tickets: A Renormalization Group Perspective
TLDR
It is found that iterative magnitude pruning, the method used for discovering winning tickets, is a renormalization group scheme, which opens the door to a wealth of existing numerical and theoretical tools, some of which are leveraged here to examine winning ticket universality in large scale lottery ticket experiments.
Universality of Winning Tickets: A Renormalization Group Perspective
TLDR
It is demonstrated that ResNet-50 models with transferable winning tickets have common properties, as would be expected from the theory, and that iterative magnitude pruning, the principal algorithm used for discovering winning tickets, is a renormalization group scheme.
Recent Advances on Neural Network Pruning at Initialization
TLDR
A generic formulation of neural network pruning is introduced, followed by the major classic pruning topics, and a thorough and structured literature review of PaI methods is presented, consisting of two major tracks (sparse training and sparse selection).
On the Neural Tangent Kernel Analysis of Randomly Pruned Wide Neural Networks
TLDR
This work shows that for fullyconnected neural networks when the network is pruned randomly at the initialization, as the width of each layer grows to infinity, the empirical NTK of the pruned neural network converges to that of the original (unpruned) network with some extra scaling factor.
Dimensionality Reduced Training by Pruning and Freezing Parts of a Deep Neural Network, a Survey
TLDR
This work is a survey on methods which reduce the number of trained weights in deep learning models throughout the training and proposes dimensionality reduced training as an underlying mathematical model that covers pruning and freezing during training.
Most Activation Functions Can Win the Lottery Without Excessive Depth
TLDR
It is shown that a depth L + 1 network iscient, which indicates that lottery tickets can be expected to be sold at realistic, commonly used depths while only requiring logarithmic overparametriza-tion.
Convolutional and Residual Networks Provably Contain Lottery Tickets
TLDR
It is proved that also modern architectures consisting of convolutional and residual layers that can be equipped with almost arbitrary activation functions can contain lottery tickets with high probability.
A Study of Designing Compact Audio-Visual Wake Word Spotting System Based on Iterative Fine-Tuning in Neural Network Pruning
TLDR
This paper investigates on designing a compact audio-visual WWS system by utilizing the visual information to alleviate the degradation by introducing neural network pruning strategy via the lottery ticket hypothesis in an iterative fine-tuning manner (LTH-IF), to the single- modal and multi-modal models.
ReaLPrune: ReRAM Crossbar-aware Lottery Ticket Pruned CNNs
TLDR
A novel crossbar-aware pruning strategy, referred as ReaLPrune, which can prune more than 90% of CNN weights and which reduces hardware requirements and accelerates CNN training by ~20 × compared to unpruned CNNs is proposed.
Lottery Tickets with Nonzero Biases
TLDR
This work extends multiple initialization schemes and existence proofs to nonzero biases, including explicit ’looks-linear’ approaches for ReLU activation functions, to enable truly orthogonal parameter initialization but also reduce potential pruning errors.

References

SHOWING 1-10 OF 50 REFERENCES
Dynamic Model Pruning with Feedback
TLDR
A novel model compression method is proposed that generates a sparse trained model without additional overhead by allowing dynamic allocation of the sparsity pattern and incorporating feedback signal to reactivate prematurely pruned weights to obtain a performant sparse model in one single training pass.
SNIP: Single-shot Network Pruning based on Connection Sensitivity
TLDR
This work presents a new approach that prunes a given network once at initialization prior to training, and introduces a saliency criterion based on connection sensitivity that identifies structurally important connections in the network for the given task.
Picking Winning Tickets Before Training by Preserving Gradient Flow
TLDR
This work argues that efficient training requires preserving the gradient flow through the network, and proposes a simple but effective pruning criterion called Gradient Signal Preservation (GraSP), which achieves significantly better performance than the baseline at extreme sparsity levels.
Stabilizing the Lottery Ticket Hypothesis
TLDR
This paper modifications IMP to search for subnetworks that could have been obtained by pruning early in training rather than at iteration 0, and studies subnetwork "stability," finding that - as accuracy improves in this fashion - IMP subnets train to parameters closer to those of the full network and do so with improved consistency in the face of gradient noise.
Highway and Residual Networks learn Unrolled Iterative Estimation
TLDR
It is demonstrated that an alternative viewpoint based on unrolled iterative estimation -- a group of successive layers iteratively refine their estimates of the same features instead of computing an entirely new representation leads to the construction of Highway and Residual networks.
Pruning Neural Networks at Initialization: Why are We Missing the Mark?
TLDR
It is shown that, unlike pruning after training, accuracy is the same or higher when randomly shuffling which weights these methods prune within each layer or sampling new initial values, undermining the claimed justifications for these methods and suggesting broader challenges with the underlying pruning heuristics.
Playing Lottery Tickets with Vision and Language
TLDR
This work uses UNITER, one of the best-performing V+L models, as the testbed, and conducts the first empirical study to assess whether trainable subnetworks also exist in pre-trained V+ L models.
Do We Actually Need Dense Over-Parameterization? In-Time Over-Parameterization in Sparse Training
TLDR
A new perspective on training deep neural networks capable of state-of-the-art performance without the need for the expensive over-parameterization is introduced by proposing the concept of In-Time Over-Parameterization (ITOP) in sparse training.
Selfish Sparse RNN Training
TLDR
This paper proposes SNT-ASGD, a novel variant of the averaged stochastic gradient optimizer, which significantly improves the performance of all sparse training methods for RNNs, and achieves state-of-the-art sparse training results, better than the dense-to-sparse methods.
EarlyBERT: Efficient BERT Training via Early-bird Lottery Tickets
TLDR
By slimming the self-attention and fully-connected sub-layers inside a transformer, this work is the first to identify structured winning tickets in the early stage of BERT training.
...
...