• Corpus ID: 247026123

How I Learned to Stop Worrying and Love Retraining

  title={How I Learned to Stop Worrying and Love Retraining},
  author={Max Zimmer and Christoph Spiegel and Sebastian Pokutta},
Many Neural Network Pruning approaches consist of several iterative training and pruning steps, seemingly losing a significant amount of their performance after pruning and then recovering it in the subsequent retraining phase. Recent works of Renda et al. (2020) and Le & Hua (2021) demonstrate the significance of the learning rate schedule during the retraining phase and propose specific heuristics for choosing such a schedule for IMP (Han et al., 2015). We place these findings in the context… 

Compression-aware Training of Neural Networks using Frank-Wolfe

This work proposes leveraging k -support norm ball constraints and demonstrates improvements over the results of Miao et al.



Network Pruning That Matters: A Case Study on Retraining Variants

It is found that the reason behind the success of learning rate rewinding is the usage of a large learning rate, and the cruciality of the learning rate schedule in pruned network retraining is emphasized – a detail often overlooked by practioners during the implementation of network pruning.

Stabilizing the Lottery Ticket Hypothesis

This paper modifications IMP to search for subnetworks that could have been obtained by pruning early in training rather than at iteration 0, and studies subnetwork "stability," finding that - as accuracy improves in this fashion - IMP subnets train to parameters closer to those of the full network and do so with improved consistency in the face of gradient noise.

Budgeted Training: Rethinking Deep Neural Network Training Under Resource Constraints

The key to a good schedule is budgeted convergence, a phenomenon whereby the gradient vanishes at the end of each allowed budget, and it is shown that budget-aware learning schedules readily outperform existing approaches under (the practical but under-explored) budgeted training setting.

The Two Regimes of Deep Network Training

Two distinct phases of training are isolated, one of which exhibits a rather poor performance from an optimization point of view but is the primary contributor to model generalization and the other exhibits much more "convex-like" optimization behavior but used in isolation produces models that generalize poorly.

Understanding deep learning requires rethinking generalization

These experiments establish that state-of-the-art convolutional networks for image classification trained with stochastic gradient methods easily fit a random labeling of the training data, and confirm that simple depth two neural networks already have perfect finite sample expressivity.

SNIP: Single-shot Network Pruning based on Connection Sensitivity

This work presents a new approach that prunes a given network once at initialization prior to training, and introduces a saliency criterion based on connection sensitivity that identifies structurally important connections in the network for the given task.

The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks

This work finds that dense, randomly-initialized, feed-forward networks contain subnetworks ("winning tickets") that - when trained in isolation - reach test accuracy comparable to the original network in a similar number of iterations, and articulate the "lottery ticket hypothesis".

The State of Sparsity in Deep Neural Networks

It is shown that unstructured sparse architectures learned through pruning cannot be trained from scratch to the same test set performance as a model trained with joint sparsification and optimization, and the need for large-scale benchmarks in the field of model compression is highlighted.

Pruning neural networks without any data by iteratively conserving synaptic flow

The data-agnostic pruning algorithm challenges the existing paradigm that, at initialization, data must be used to quantify which synapses are important, and consistently competes with or outperforms existing state-of-the-art pruning algorithms at initialization over a range of models, datasets, and sparsity constraints.

Cyclical Learning Rates for Training Neural Networks

  • L. Smith
  • Computer Science
    2017 IEEE Winter Conference on Applications of Computer Vision (WACV)
  • 2017
A new method for setting the learning rate, named cyclical learning rates, is described, which practically eliminates the need to experimentally find the best values and schedule for the global learning rates.