Corpus ID: 236318574

Pruning Ternary Quantization

  title={Pruning Ternary Quantization},
  author={Dan Liu and Xi Chen and Jie Fu and Xue Liu},
  • Dan Liu, Xi Chen, +1 author Xue Liu
  • Published 2021
  • Computer Science
  • ArXiv
We propose pruning ternary quantization (PTQ), a simple, yet effective, symmetric ternary quantization method. The method significantly compresses neural network weights to a sparse ternary of {−1, 0, 1} and thus reduces computational, storage, and memory footprints. We show that PTQ can convert regular weights to ternary orthonormal bases by simply using pruning and L2 projection. In addition, we introduce a refined straight-through estimator to finalize and stabilize the quantized weights… Expand

Figures and Tables from this paper


Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding
This work introduces "deep compression", a three stage pipeline: pruning, trained quantization and Huffman coding, that work together to reduce the storage requirement of neural networks by 35x to 49x without affecting their accuracy. Expand
Trained Ternary Quantization
This work proposes Trained Ternary Quantization (TTQ), a method that can reduce the precision of weights in neural networks to ternary values to improve the accuracy of some models (32, 44, 56-layer ResNet) on CIFAR-10 and AlexNet on ImageNet. Expand
CLIP-Q: Deep Network Compression Learning by In-parallel Pruning-Quantization
The proposed CLIP-Q method (Compression Learning by In-Parallel Pruning-Quantization) compresses AlexNet, GoogLeNet, and ResNet-50 by 10-fold, while preserving the uncompressed network accuracies on ImageNet, to take advantage of the complementary nature of pruning and quantization. Expand
Additive Powers-of-Two Quantization: An Efficient Non-uniform Discretization for Neural Networks
Experimental results show that the proposed Additive Powers-of-Two~(APoT) quantization method outperforms state-of theart methods, and is even competitive with the full-precision models, demonstrating the effectiveness of the proposed APoT quantization. Expand
Bridging the Accuracy Gap for 2-bit Quantized Neural Networks (QNN)
This paper proposes novel techniques that target weight and activation quantizations separately resulting in an overall quantized neural network (QNN) that achieves state-of-the-art classification accuracy across a range of popular models and datasets. Expand
PACT: Parameterized Clipping Activation for Quantized Neural Networks
It is shown, for the first time, that both weights and activations can be quantized to 4-bits of precision while still achieving accuracy comparable to full precision networks across a range of popular models and datasets. Expand
And the Bit Goes Down: Revisiting the Quantization of Neural Networks
This paper introduces a vector quantization method that aims at preserving the quality of the reconstruction of the network outputs rather than its weights and minimizes the loss reconstruction error for in-domain inputs. Expand
Incremental Network Quantization: Towards Lossless CNNs with Low-Precision Weights
Extensive experiments on the ImageNet classification task using almost all known deep CNN architectures including AlexNet, VGG-16, GoogleNet and ResNets well testify the efficacy of the proposed INQ, showing that at 5-bit quantization, models have improved accuracy than the 32-bit floating-point references. Expand
LQ-Nets: Learned Quantization for Highly Accurate and Compact Deep Neural Networks
This work proposes to jointly train a quantized, bit-operation-compatible DNN and its associated quantizers, as opposed to using fixed, handcrafted quantization schemes such as uniform or logarithmic quantization, to address the gap in prediction accuracy between the quantized model and the full-precision model. Expand
Learning Discrete Weights Using the Local Reparameterization Trick
This work introduces LR-nets (Local reparameterization networks), a new method for training neural networks with discrete weights using stochastic parameters and shows how a simple modification to the local reparametersization trick enables the training of discrete weights. Expand