Compact representations of convolutional neural networks via weight pruning and quantization
@article{Marin2021CompactRO, title={Compact representations of convolutional neural networks via weight pruning and quantization}, author={Giosu{\`e} Cataldo Marin{\`o} and Alessandro Petrini and Dario Malchiodi and Marco Frasca}, journal={ArXiv}, year={2021}, volume={abs/2108.12704} }
The state-of-the-art performance for several realworld problems is currently reached by convolutional neural networks (CNN). Such learning models exploit recent results in the field of deep learning, typically leading to highly performing, yet very large neural networks with (at least) millions of parameters. As a result, the deployment of such models is not possible when only small amounts of RAM are available, or in general within resource-limited platforms, and strategies to compress CNNs…
References
SHOWING 1-10 OF 63 REFERENCES
Reproducing the Sparse Huffman Address Map Compression for Deep Neural Networks
- Computer ScienceRRPR
- 2021
The proposed implementation, which is described in this paper, offers different compression schemes (pruning, two types of weight quantization, and their combinations) and two compact representations: the Huffman Address Map compression (HAM), and its sparse version sHAM.
A Survey of Model Compression and Acceleration for Deep Neural Networks
- Computer ScienceArXiv
- 2017
This paper survey the recent advanced techniques for compacting and accelerating CNNs model developed, roughly categorized into four schemes: parameter pruning and sharing, low-rank factorization, transferred/compact convolutional filters, and knowledge distillation.
Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding
- Computer ScienceICLR
- 2016
This work introduces "deep compression", a three stage pipeline: pruning, trained quantization and Huffman coding, that work together to reduce the storage requirement of neural networks by 35x to 49x without affecting their accuracy.
Incremental Network Quantization: Towards Lossless CNNs with Low-Precision Weights
- Computer ScienceICLR
- 2017
Extensive experiments on the ImageNet classification task using almost all known deep CNN architectures including AlexNet, VGG-16, GoogleNet and ResNets well testify the efficacy of the proposed INQ, showing that at 5-bit quantization, models have improved accuracy than the 32-bit floating-point references.
Value-aware Quantization for Training and Inference of Neural Networks
- Computer ScienceECCV
- 2018
We propose a novel value-aware quantization which applies aggressively reduced precision to the majority of data while separately handling a small amount of large data in high precision, which…
Speeding up Convolutional Neural Networks with Low Rank Expansions
- Computer ScienceBMVC
- 2014
Two simple schemes for drastically speeding up convolutional neural networks are presented, achieved by exploiting cross-channel or filter redundancy to construct a low rank basis of filters that are rank-1 in the spatial domain.
On Compressing Deep Models by Low Rank and Sparse Decomposition
- Computer Science2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
- 2017
A unified framework integrating the low-rank and sparse decomposition of weight matrices with the feature map reconstructions is proposed, which can significantly reduce the parameters for both convolutional and fully-connected layers.
Pruning Filters for Efficient ConvNets
- Computer ScienceICLR
- 2017
This work presents an acceleration method for CNNs, where it is shown that even simple filter pruning techniques can reduce inference costs for VGG-16 and ResNet-110 by up to 38% on CIFAR10 while regaining close to the original accuracy by retraining the networks.
Universal Deep Neural Network Compression
- Computer ScienceIEEE Journal of Selected Topics in Signal Processing
- 2020
This work for the first time introduces universal DNN compression by universal vector quantization and universal source coding, which utilizes universal lattice quantization, which randomizes the source by uniform random dithering before latticequantization and can perform near-optimally on any source without relying on knowledge of the source distribution.
Importance Estimation for Neural Network Pruning
- Computer Science2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
- 2019
A novel method that estimates the contribution of a neuron (filter) to the final loss and iteratively removes those with smaller scores and two variations of this method using the first and second-order Taylor expansions to approximate a filter's contribution are described.