Only Train Once: A One-Shot Neural Network Training And Pruning Framework
@inproceedings{Chen2021OnlyTO, title={Only Train Once: A One-Shot Neural Network Training And Pruning Framework}, author={Tianyi Chen and Bo Ji and Tianyu Ding and Biyi Fang and Guanyi Wang and Zhihui Zhu and Luming Liang and Yixin Shi and Sheng Yi and Xiao Tu}, booktitle={Neural Information Processing Systems}, year={2021} }
Structured pruning is a commonly used technique in deploying deep neural networks (DNNs) onto resource-constrained devices. However, the existing pruning methods are usually heuristic, task-specified, and require an extra fine-tuning procedure. To overcome these limitations, we propose a framework that compresses DNNs into slimmer architectures with competitive performances and significant FLOPs reductions by Only-Train-Once (OTO). OTO contains two keys: (i) we partition the parameters of DNNs…
Figures and Tables from this paper
26 Citations
Learning Pruning-Friendly Networks via Frank-Wolfe: One-Shot, Any-Sparsity, And No Retraining
- Computer ScienceICLR
- 2022
A novel framework to train a large deep neural network for only once, which can then be pruned to any sparsity ratio to preserve competitive accuracy without any re-training, and a stochastic Frank-Wolfe (SFW) algorithm to solve this new constrained optimization.
Topology-Aware Network Pruning using Multi-stage Graph Embedding and Reinforcement Learning
- Computer ScienceICML
- 2022
A novel multi-stage graph embedding technique based on graph neural networks (GNNs) to iden-tify DNN topologies and use reinforcement learning (RL) to use a suitable compression policy, which can achieve higher compression ratios with a minimal tuning cost yet yields outstanding and competitive performance.
Deep Neural Networks pruning via the Structured Perspective Regularization
- Computer ScienceArXiv
- 2022
This work proposes a new pruning method based on Operational Research tools that leads to structured pruning of the initial architecture, and starts from a natural Mixed-Integer-Programming model for the problem, and uses the Perspective Reformulation technique to strengthen its continuous relaxation.
DNN pruning with principal component analysis and connection importance estimation
- Computer ScienceJ. Syst. Archit.
- 2022
Sparsity-guided Network Design for Frame Interpolation
- Computer ScienceArXiv
- 2022
A compression-driven network design for frame interpolation that leverages model pruning through sparsity-inducing optimization to greatly reduce the model size while attaining higher performance.
Receding Neuron Importances for Structured Pruning
- Computer ScienceArXiv
- 2022
A novel regularisation term is designed, focused on shrinking only neurons with lesser by its gradient decay exponentially for higher importances, which outperforms related approaches for VGG models and shows that severe degradation can be attributed to over-pruning early layers of the network.
One-shot Network Pruning at Initialization with Discriminative Image Patches
- Computer ScienceArXiv
- 2022
This paper proposes two novel methods, Discriminative One-shot Network Pruning (DOP) and Super Stitching, to prune the network by high-level visual discriminative image patches, and reveals that OPaI is data-dependent.
EAPruning: Evolutionary Pruning for Vision Transformers and CNNs
- Computer ScienceArXiv
- 2022
A simple and effective approach that can be easily applied to both vision transformers and convolutional neural networks that inherit weights through reconstruction techniques is undertaken.
CrAM: A Compression-Aware Minimizer
- Computer ScienceArXiv
- 2022
A new compression-aware minimizer dubbed CrAM is proposed, which modifies the SGD training iteration in a principled way, in order to produce models whose local loss behavior is stable under compression operations such as weight pruning or quantization.
AutoDistil: Few-shot Task-agnostic Neural Architecture Search for Distilling Large Language Models
- Computer ScienceArXiv
- 2022
Experiments on GLUE benchmark demonstrate AutoDistil to outperform state-of-the-art KD and NAS methods with upto 41 x reduction in computational cost.
References
SHOWING 1-10 OF 99 REFERENCES
SNIP: Single-shot Network Pruning based on Connection Sensitivity
- Computer ScienceICLR
- 2019
This work presents a new approach that prunes a given network once at initialization prior to training, and introduces a saliency criterion based on connection sensitivity that identifies structurally important connections in the network for the given task.
Data-Driven Sparse Structure Selection for Deep Neural Networks
- Computer ScienceECCV
- 2018
A simple and effective framework to learn and prune deep models in an end-to-end manner by adding sparsity regularizations on factors, and solving the optimization problem by a modified stochastic Accelerated Proximal Gradient (APG) method.
ThiNet: A Filter Level Pruning Method for Deep Neural Network Compression
- Computer Science2017 IEEE International Conference on Computer Vision (ICCV)
- 2017
ThiNet is proposed, an efficient and unified framework to simultaneously accelerate and compress CNN models in both training and inference stages, and it is revealed that it needs to prune filters based on statistics information computed from its next layer, not the current layer, which differentiates ThiNet from existing methods.
The State of Sparsity in Deep Neural Networks
- Computer ScienceArXiv
- 2019
It is shown that unstructured sparse architectures learned through pruning cannot be trained from scratch to the same test set performance as a model trained with joint sparsification and optimization, and the need for large-scale benchmarks in the field of model compression is highlighted.
PruneTrain: fast neural network training by dynamic sparse model reconfiguration
- Computer ScienceSC
- 2019
This work proposes PruneTrain, a cost-efficient mechanism that gradually reduces the training cost during training by using a structured group-lasso regularization approach that drives the training optimization toward both high accuracy and small weight values.
Network Trimming: A Data-Driven Neuron Pruning Approach towards Efficient Deep Architectures
- Computer ScienceArXiv
- 2016
This paper introduces network trimming which iteratively optimizes the network by pruning unimportant neurons based on analysis of their outputs on a large dataset, inspired by an observation that the outputs of a significant portion of neurons in a large network are mostly zero.
A Survey of Model Compression and Acceleration for Deep Neural Networks
- Computer ScienceArXiv
- 2017
This paper survey the recent advanced techniques for compacting and accelerating CNNs model developed, roughly categorized into four schemes: parameter pruning and sharing, low-rank factorization, transferred/compact convolutional filters, and knowledge distillation.
Operation-Aware Soft Channel Pruning using Differentiable Masks
- Computer ScienceICML
- 2020
A simple but effective data-driven channel pruning algorithm, which compresses deep neural networks in a differentiable way by exploiting the characteristics of operations, and helps to explore larger search space and train more stable networks.
Learning Structured Sparsity in Deep Neural Networks
- Computer ScienceNIPS
- 2016
The results show that for CIFAR-10, regularization on layer depth can reduce 20 layers of a Deep Residual Network to 18 layers while improve the accuracy from 91.25% to 92.60%, which is still slightly higher than that of original ResNet with 32 layers.
Group Sparsity: The Hinge Between Filter Pruning and Decomposition for Network Compression
- Computer Science2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
- 2020
This paper analyzes two popular network compression techniques, i.e. filter pruning and low-rank decomposition, in a unified sense and proposes to compress the whole network jointly instead of in a layer-wise manner.