• Corpus ID: 235899080

Only Train Once: A One-Shot Neural Network Training And Pruning Framework

  title={Only Train Once: A One-Shot Neural Network Training And Pruning Framework},
  author={Tianyi Chen and Bo Ji and Tianyu Ding and Biyi Fang and Guanyi Wang and Zhihui Zhu and Luming Liang and Yixin Shi and Sheng Yi and Xiao Tu},
  booktitle={Neural Information Processing Systems},
Structured pruning is a commonly used technique in deploying deep neural networks (DNNs) onto resource-constrained devices. However, the existing pruning methods are usually heuristic, task-specified, and require an extra fine-tuning procedure. To overcome these limitations, we propose a framework that compresses DNNs into slimmer architectures with competitive performances and significant FLOPs reductions by Only-Train-Once (OTO). OTO contains two keys: (i) we partition the parameters of DNNs… 

Learning Pruning-Friendly Networks via Frank-Wolfe: One-Shot, Any-Sparsity, And No Retraining

A novel framework to train a large deep neural network for only once, which can then be pruned to any sparsity ratio to preserve competitive accuracy without any re-training, and a stochastic Frank-Wolfe (SFW) algorithm to solve this new constrained optimization.

Topology-Aware Network Pruning using Multi-stage Graph Embedding and Reinforcement Learning

A novel multi-stage graph embedding technique based on graph neural networks (GNNs) to iden-tify DNN topologies and use reinforcement learning (RL) to use a suitable compression policy, which can achieve higher compression ratios with a minimal tuning cost yet yields outstanding and competitive performance.

Deep Neural Networks pruning via the Structured Perspective Regularization

This work proposes a new pruning method based on Operational Research tools that leads to structured pruning of the initial architecture, and starts from a natural Mixed-Integer-Programming model for the problem, and uses the Perspective Reformulation technique to strengthen its continuous relaxation.

Sparsity-guided Network Design for Frame Interpolation

A compression-driven network design for frame interpolation that leverages model pruning through sparsity-inducing optimization to greatly reduce the model size while attaining higher performance.

Receding Neuron Importances for Structured Pruning

A novel regularisation term is designed, focused on shrinking only neurons with lesser by its gradient decay exponentially for higher importances, which outperforms related approaches for VGG models and shows that severe degradation can be attributed to over-pruning early layers of the network.

One-shot Network Pruning at Initialization with Discriminative Image Patches

This paper proposes two novel methods, Discriminative One-shot Network Pruning (DOP) and Super Stitching, to prune the network by high-level visual discriminative image patches, and reveals that OPaI is data-dependent.

EAPruning: Evolutionary Pruning for Vision Transformers and CNNs

A simple and effective approach that can be easily applied to both vision transformers and convolutional neural networks that inherit weights through reconstruction techniques is undertaken.

CrAM: A Compression-Aware Minimizer

A new compression-aware minimizer dubbed CrAM is proposed, which modifies the SGD training iteration in a principled way, in order to produce models whose local loss behavior is stable under compression operations such as weight pruning or quantization.

AutoDistil: Few-shot Task-agnostic Neural Architecture Search for Distilling Large Language Models

Experiments on GLUE benchmark demonstrate AutoDistil to outperform state-of-the-art KD and NAS methods with upto 41 x reduction in computational cost.



SNIP: Single-shot Network Pruning based on Connection Sensitivity

This work presents a new approach that prunes a given network once at initialization prior to training, and introduces a saliency criterion based on connection sensitivity that identifies structurally important connections in the network for the given task.

Data-Driven Sparse Structure Selection for Deep Neural Networks

A simple and effective framework to learn and prune deep models in an end-to-end manner by adding sparsity regularizations on factors, and solving the optimization problem by a modified stochastic Accelerated Proximal Gradient (APG) method.

ThiNet: A Filter Level Pruning Method for Deep Neural Network Compression

ThiNet is proposed, an efficient and unified framework to simultaneously accelerate and compress CNN models in both training and inference stages, and it is revealed that it needs to prune filters based on statistics information computed from its next layer, not the current layer, which differentiates ThiNet from existing methods.

The State of Sparsity in Deep Neural Networks

It is shown that unstructured sparse architectures learned through pruning cannot be trained from scratch to the same test set performance as a model trained with joint sparsification and optimization, and the need for large-scale benchmarks in the field of model compression is highlighted.

PruneTrain: fast neural network training by dynamic sparse model reconfiguration

This work proposes PruneTrain, a cost-efficient mechanism that gradually reduces the training cost during training by using a structured group-lasso regularization approach that drives the training optimization toward both high accuracy and small weight values.

Network Trimming: A Data-Driven Neuron Pruning Approach towards Efficient Deep Architectures

This paper introduces network trimming which iteratively optimizes the network by pruning unimportant neurons based on analysis of their outputs on a large dataset, inspired by an observation that the outputs of a significant portion of neurons in a large network are mostly zero.

A Survey of Model Compression and Acceleration for Deep Neural Networks

This paper survey the recent advanced techniques for compacting and accelerating CNNs model developed, roughly categorized into four schemes: parameter pruning and sharing, low-rank factorization, transferred/compact convolutional filters, and knowledge distillation.

Operation-Aware Soft Channel Pruning using Differentiable Masks

A simple but effective data-driven channel pruning algorithm, which compresses deep neural networks in a differentiable way by exploiting the characteristics of operations, and helps to explore larger search space and train more stable networks.

Learning Structured Sparsity in Deep Neural Networks

The results show that for CIFAR-10, regularization on layer depth can reduce 20 layers of a Deep Residual Network to 18 layers while improve the accuracy from 91.25% to 92.60%, which is still slightly higher than that of original ResNet with 32 layers.

Group Sparsity: The Hinge Between Filter Pruning and Decomposition for Network Compression

This paper analyzes two popular network compression techniques, i.e. filter pruning and low-rank decomposition, in a unified sense and proposes to compress the whole network jointly instead of in a layer-wise manner.