Cluster Regularized Quantization for Deep Networks Compression

  title={Cluster Regularized Quantization for Deep Networks Compression},
  author={Yiming Hu and Jianquan Li and Xianlei Long and Shenhua Hu and Jiagang Zhu and Xingang Wang and Qingyi Gu},
  journal={2019 IEEE International Conference on Image Processing (ICIP)},
  • Yiming Hu, Jianquan Li, +4 authors Qingyi Gu
  • Published 27 February 2019
  • Computer Science
  • 2019 IEEE International Conference on Image Processing (ICIP)
Deep neural networks (DNNs) have achieved great success in a wide range of computer vision areas, but the applications to mobile devices is limited due to their high storage and computational cost. Much efforts have been devoted to compress DNNs. In this paper, we propose a simple yet effective method for deep networks compression, named Cluster Regularized Quantization (CRQ), which can reduce the presentation precision of a full-precision model to ternary values without significant accuracy… Expand
3 Citations
Compressing Deep Networks by Neuron Agglomerative Clustering
This paper introduces a method for compressing the structure and parameters of DNNs based on neuron agglomerative clustering (NAC), and demonstrates that NAC is very effective for the neuron Agglomeration of both the fully connected and convolutional layers, delivering similar or even higher network accuracy. Expand
A New Clustering-Based Technique for the Acceleration of Deep Convolutional Networks
This work proposes a clustering-based approach that is able to increase the number of employed centroids/representatives, while at the same time, have an acceleration gain compared to conventional, k-means based approaches within the MCA framework. Expand
Second-Order Response Transform Attention Network for Image Classification
This work proposes a novel Second-order Response Transform Attention Network (SoRTA-Net) for classification tasks, which can be flexibly inserted into existing CNNs without any modification of network topology. Expand


Soft Weight-Sharing for Neural Network Compression
This paper shows that competitive compression rates can be achieved by using a version of  “soft weight-sharing” (Nowlan & Hinton, 1992) and achieves both quantization and pruning in one simple (re-)training procedure, exposing the relation between compression and the minimum description length (MDL) principle. Expand
Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding
This work introduces "deep compression", a three stage pipeline: pruning, trained quantization and Huffman coding, that work together to reduce the storage requirement of neural networks by 35x to 49x without affecting their accuracy. Expand
Trained Ternary Quantization
This work proposes Trained Ternary Quantization (TTQ), a method that can reduce the precision of weights in neural networks to ternary values to improve the accuracy of some models (32, 44, 56-layer ResNet) on CIFAR-10 and AlexNet on ImageNet. Expand
Weighted-Entropy-Based Quantization for Deep Neural Networks
This paper proposes a novel method for quantizing weights and activations based on the concept of weighted entropy, which achieves significant reductions in both the model size and the amount of computation with minimal accuracy loss. Expand
Two-Step Quantization for Low-bit Neural Networks
A simple yet effective Two-Step Quantization (TSQ) framework is proposed, by decomposing the network quantization problem into two steps: code learning and transformation function learning based on the learned codes, and the sparse quantization method for code learning. Expand
Deep k-Means: Re-Training and Parameter Sharing with Harder Cluster Assignments for Compressing Deep Convolutions
A novel spectrally relaxed $k-means regularization is introduced, which tends to make hard assignments of convolutional layer weights to learned cluster centers during re-training, and an improved set of metrics to estimate energy consumption of CNN hardware implementations are proposed. Expand
Network Trimming: A Data-Driven Neuron Pruning Approach towards Efficient Deep Architectures
This paper introduces network trimming which iteratively optimizes the network by pruning unimportant neurons based on analysis of their outputs on a large dataset, inspired by an observation that the outputs of a significant portion of neurons in a large network are mostly zero. Expand
BinaryConnect: Training Deep Neural Networks with binary weights during propagations
BinaryConnect is introduced, a method which consists in training a DNN with binary weights during the forward and backward propagations, while retaining precision of the stored weights in which gradients are accumulated, and near state-of-the-art results with BinaryConnect are obtained on the permutation-invariant MNIST, CIFAR-10 and SVHN. Expand
Factorized Convolutional Neural Networks
The proposed convolutional layer is composed of a low-cost single intra-channel convolution and a linear channel projection that can effectively preserve the spatial information and maintain the accuracy with significantly less computation. Expand
Performance Guaranteed Network Acceleration via High-Order Residual Quantization
This paper proposes a highorder binarization scheme, which achieves more accurate approximation while still possesses the advantage of binary operation, and recursively performs residual quantization and yields a series of binary input images with decreasing magnitude scales. Expand