• Corpus ID: 235826360

Lottery Ticket Preserves Weight Correlation: Is It Desirable or Not?

@inproceedings{Liu2021LotteryTP,
  title={Lottery Ticket Preserves Weight Correlation: Is It Desirable or Not?},
  author={Ning Liu and Geng Yuan and Zhengping Che and Xuan Shen and Xiaolong Ma and Qing Jin and Jian Ren and Jian Tang and Sijia Liu and Yanzhi Wang},
  booktitle={ICML},
  year={2021}
}
In deep model compression, the recent finding “Lottery Ticket Hypothesis” (LTH) (Frankle & Carbin, 2018) pointed out that there could exist a winning ticket (i.e., a properly pruned subnetwork together with original weight initialization) that can achieve competitive performance than the original dense network. However, it is not easy to observe such winning property in many scenarios, where for example, a relatively large learning rate is used even if it benefits training the original dense… 
Analyzing Lottery Ticket Hypothesis from PAC-Bayesian Theory Perspective
TLDR
It is shown that the PAC- Bayesian theory can provide an explicit understanding of the relationship between LTH and generalization behavior and revisit existing algorithms for winning tickets from a PAC-Bayesian perspective and provide new insights into these methods.
Plant 'n' Seek: Can You Find the Winning Ticket?
TLDR
To analyze the ability of state-of-the-art pruning to identify tickets of extreme sparsity, this work design and hide winning tickets with desirable properties in randomly initialized neural networks and concludes that the current limitations in ticket sparsity are likely of algorithmic rather than fundamental nature.
Convolutional and Residual Networks Provably Contain Lottery Tickets
TLDR
It is proved that also modern architectures consisting of convolutional and residual layers that can be equipped with almost arbitrary activation functions can contain lottery tickets with high probability.
Most Activation Functions Can Win the Lottery Without Excessive Depth
TLDR
It is shown that a depth L + 1 network iscient, which indicates that lottery tickets can be expected to be sold at realistic, commonly used depths while only requiring logarithmic overparametriza-tion.
Validating the Lottery Ticket Hypothesis with Inertial Manifold Theory
TLDR
This work theoretically verifies the validity of the Lottery Ticket Hypothesis and explores the possibility of theoretically lossless pruning as well as one-time pruning, compared with existing neural network pruning and LTH techniques.
Lottery Tickets with Nonzero Biases
TLDR
This work extends multiple initialization schemes and existence proofs to nonzero biases, including explicit ’looks-linear’ approaches for ReLU activation functions, to enable truly orthogonal parameter initialization but also reduce potential pruning errors.
On the Effect of Pruning on Adversarial Robustness
  • Artur Jordão, H. Pedrini
  • Computer Science
    2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW)
  • 2021
TLDR
It is demonstrated that pruning structures from convolutional networks increase not only generalization but also robustness to adversarial images (natural images with content modified) and this family of strategies provides additional benefits beyond computational performance and generalization.
MEST: Accurate and Fast Memory-Economic Sparse Training Framework on the Edge
TLDR
The results suggest that unforgettable examples can be identified in-situ even during the dynamic exploration of sparsity masks in the sparse training process, and therefore can be removed for further training speedup on edge devices.
F8Net: Fixed-Point 8-bit Only Multiplication for Network Quantization
TLDR
This work presents F8Net, a novel quantization framework consisting of only fixed-point 8-bit multiplication, which achieves comparable and better performance, when compared not only to existing quantization techniques with INT32 multiplication or floating-point arithmetic, but also to the full-precision counterparts, achieving state-of-the-art performance.
PLANT ’N’ SEEK: CAN YOU FIND
  • 2021

References

SHOWING 1-10 OF 35 REFERENCES
Drawing early-bird tickets: Towards more efficient training of deep networks
TLDR
This paper discovers for the first time that the winning tickets can be identified at the very early training stage, which it is term as early-bird (EB) tickets, via low-cost training schemes at large learning rates, consistent with recently reported observations that the key connectivity patterns of neural networks emerge early.
Proving the Lottery Ticket Hypothesis: Pruning is All You Need
TLDR
An even stronger hypothesis is proved, showing that for every bounded distribution and every target network with bounded weights, a sufficiently over-parameterized neural network with random weights contains a subnetwork with roughly the same accuracy as the target network, without any further training.
Stabilizing the Lottery Ticket Hypothesis
TLDR
This paper modifications IMP to search for subnetworks that could have been obtained by pruning early in training rather than at iteration 0, and studies subnetwork "stability," finding that - as accuracy improves in this fashion - IMP subnets train to parameters closer to those of the full network and do so with improved consistency in the face of gradient noise.
One ticket to win them all: generalizing lottery ticket initializations across datasets and optimizers
TLDR
It is found that, within the natural images domain, winning ticket initializations generalized across a variety of datasets, including Fashion MNIST, SVHN, CIFAR-10/100, ImageNet, and Places365, often achieving performance close to that of winning tickets generated on the same dataset.
The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks
TLDR
This work finds that dense, randomly-initialized, feed-forward networks contain subnetworks ("winning tickets") that - when trained in isolation - reach test accuracy comparable to the original network in a similar number of iterations, and articulate the "lottery ticket hypothesis".
The Lottery Ticket Hypothesis for Pre-trained BERT Networks
TLDR
This work finds matching subnetworks at 40% to 90% sparsity in BERT models at (pre-trained) initialization, a deviation from prior NLP research where they emerge only after some amount of training.
To prune, or not to prune: exploring the efficacy of pruning for model compression
TLDR
Across a broad range of neural network architectures, large-sparse models are found to consistently outperform small-dense models and achieve up to 10x reduction in number of non-zero parameters with minimal loss in accuracy.
Rethinking the Value of Network Pruning
TLDR
It is found that with optimal learning rate, the "winning ticket" initialization as used in Frankle & Carbin (2019) does not bring improvement over random initialization, and the need for more careful baseline evaluations in future research on structured pruning methods is suggested.
DropNet: Reducing Neural Network Complexity via Iterative Pruning
TLDR
DropNet, an iterative pruning method which prunes nodes/filters to reduce network complexity, is proposed and shown to be robust across diverse scenarios, including MLPs and CNNs using the MNIST, CIFAR-10 and Tiny ImageNet datasets.
HRank: Filter Pruning Using High-Rank Feature Map
TLDR
This paper proposes a novel filter pruning method by exploring the High Rank of feature maps (HRank), inspired by the discovery that the average rank of multiple feature maps generated by a single filter is always the same, regardless of the number of image batches CNNs receive.
...
...