COPS: Controlled Pruning Before Training Starts

  title={COPS: Controlled Pruning Before Training Starts},
  author={Paul Wimmer and Jens Mehnert and Alexandru Condurache},
State-of-the-art deep neural network (DNN) pruning techniques, applied one-shot before training starts, evaluate sparse architectures with the help of a single criterion—called pruning score. Pruning weights based on a solitary score works well for some architectures and pruning rates but may also fail for other ones. As a common baseline for pruning scores, we introduce the notion of a generalized synaptic score (GSS). In this work we do not concentrate on a single pruning criterion, but… Expand

Figures and Tables from this paper


Pruning neural networks without any data by iteratively conserving synaptic flow
The data-agnostic pruning algorithm challenges the existing paradigm that, at initialization, data must be used to quantify which synapses are important, and consistently competes with or outperforms existing state-of-the-art pruning algorithms at initialization over a range of models, datasets, and sparsity constraints. Expand
Learning Multiple Layers of Features from Tiny Images
It is shown how to train a multi-layer generative model that learns to extract meaningful features which resemble those found in the human visual cortex, using a novel parallelization algorithm to distribute the work among multiple machines connected on a network. Expand
Gradient-based learning applied to document recognition
This paper reviews various methods applied to handwritten character recognition and compares them on a standard handwritten digit recognition task, and Convolutional neural networks are shown to outperform all other techniques. Expand
Faster Dynamic Matrix Inverse for Faster LPs
This data structure is based on a recursive application of the Woodbury-Morrison identity for implementing low-rank updates, combined with recent sketching technology, and leads to the fastest known LP solver for general (dense) linear programs. Expand
Picking Winning Tickets Before Training by Preserving Gradient Flow
This work argues that efficient training requires preserving the gradient flow through the network, and proposes a simple but effective pruning criterion called Gradient Signal Preservation (GraSP), which achieves significantly better performance than the baseline at extreme sparsity levels. Expand
The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks
This work finds that dense, randomly-initialized, feed-forward networks contain subnetworks ("winning tickets") that - when trained in isolation - reach test accuracy comparable to the original network in a similar number of iterations, and articulate the "lottery ticket hypothesis". Expand
Deep Residual Learning for Image Recognition
This work presents a residual learning framework to ease the training of networks that are substantially deeper than those used previously, and provides comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth. Expand
Very Deep Convolutional Networks for Large-Scale Image Recognition
This work investigates the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting using an architecture with very small convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers. Expand
A Signal Propagation Perspective for Pruning Neural Networks at Initialization
By noting connection sensitivity as a form of gradient, this work formally characterize initialization conditions to ensure reliable connection sensitivity measurements, which in turn yields effective pruning results and modifications to the existing pruning at initialization method lead to improved results on all tested network models for image classification tasks. Expand
Pruning via Iterative Ranking of Sensitivity Statistics
This work shows that by applying the sensitivity criterion iteratively in smaller steps - still before training - it can improve its performance without difficult implementation, and demonstrates how it can be applied for both structured and unstructured pruning, before and/or during training, achieving state-of-the-art sparsity-performance trade-offs. Expand