Demystifying and Generalizing BinaryConnect
@inproceedings{Dockhorn2021DemystifyingAG, title={Demystifying and Generalizing BinaryConnect}, author={Tim Dockhorn and Yaoliang Yu and Eyyub Sari and Mahdi Zolnouri and V. Nia}, booktitle={Neural Information Processing Systems}, year={2021} }
BinaryConnect (BC) and its many variations have become the de facto standard for neural network quantization. However, our understanding of the inner workings of BC is still quite limited. We attempt to close this gap in four different aspects: (a) we show that existing quantization algorithms, including post-training quantization, are surprisingly similar to each other; (b) we argue for proximal maps as a natural family of quantizers that is both easy to design and analyze; (c) we refine the…
Figures and Tables from this paper
3 Citations
SiMaN: Sign-to-Magnitude Network Binarization
- Computer ScienceIEEE transactions on pattern analysis and machine intelligence
- 2022
It is shown that the weight binarization provides an analytical solution by encoding high-magnitude weights into +1s, and 0 s otherwise, and therefore, a high-quality discrete solution is established in a computationally efficient manner without the sign function.
Spartan: Differentiable Sparsity via Regularized Transportation
- Computer ScienceArXiv
- 2022
We present Spartan, a method for training sparse neural network models with a predetermined level of sparsity. Spartan is based on a combination of two techniques: (1) soft top- k masking of…
Channel Pruning In Quantization-aware Training: An Adaptive Projection-gradient Descent-shrinkage-splitting Method
- Computer ScienceArXiv
- 2022
An adaptive projection-gradient descent- shrinkage-splitting method to integrate penalty based channel pruning into quantization-aware training (QAT) and a novel complementary transformed l 1 penalty to stabilize the training for extreme compression is proposed.
References
SHOWING 1-10 OF 62 REFERENCES
Mirror Descent View for Neural Network Quantization
- Computer ScienceAISTATS
- 2021
By interpreting the continuous parameters (unconstrained) as the dual of the quantized ones, a Mirror Descent (MD) framework for NN quantization is introduced and conditions on the projections are provided which would enable us to derive valid mirror maps and in turn the respective MD updates.
Training Binary Neural Networks with Real-to-Binary Convolutions
- Computer ScienceICLR
- 2020
This paper shows how to build a strong baseline, which already achieves state-of-the-art accuracy, by combining recently proposed advances, and carefully tuning the optimization procedure to minimize the discrepancy between the output of the binary and the corresponding real-valued convolution.
BinaryRelax: A Relaxation Approach For Training Deep Neural Networks With Quantized Weights
- Computer ScienceSIAM J. Imaging Sci.
- 2018
BinaryRelax is proposed, a simple two-phase algorithm for training deep neural networks with quantized weights that relax the hard constraint into a continuous regularizer via Moreau envelope, which turns out to be the squared Euclidean distance to the set of quantization weights.
Incremental Network Quantization: Towards Lossless CNNs with Low-Precision Weights
- Computer ScienceICLR
- 2017
Extensive experiments on the ImageNet classification task using almost all known deep CNN architectures including AlexNet, VGG-16, GoogleNet and ResNets well testify the efficacy of the proposed INQ, showing that at 5-bit quantization, models have improved accuracy than the 32-bit floating-point references.
MetaQuant: Learning to Quantize by Learning to Penetrate Non-differentiable Quantization
- Computer ScienceNeurIPS
- 2019
A meta network is trained using g_q and r as inputs, and outputs $g_r$ for subsequent weight updates, which alleviates the problem of non-differentiability, and can be trained in an end-to-end manner.
The High-Dimensional Geometry of Binary Neural Networks
- Computer ScienceICLR
- 2017
This work explains why multilayer binary neural networks work in terms of the HD geometry and serves as a foundation for understanding not only BNNs but a variety of methods that seek to compress traditional neural networks.
Extremely Low Bit Neural Network: Squeeze the Last Bit Out with ADMM
- Computer ScienceAAAI
- 2018
This paper focuses on compressing and accelerating deep models with network weights represented by very small numbers of bits, referred to as extremely low bit neural network, and proposes to solve this problem using extragradient and iterative quantization algorithms that lead to considerably faster convergency compared to conventional optimization methods.
Alternating Multi-bit Quantization for Recurrent Neural Networks
- Computer ScienceICLR
- 2018
This work quantizes the network, both weights and activations, into multiple binary codes {-1,+1}, and forms the quantization as an optimization problem, which in both RNNs and feedforward neural networks achieves excellent performance and is extended to image classification tasks.
Weighted-Entropy-Based Quantization for Deep Neural Networks
- Computer Science2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
- 2017
This paper proposes a novel method for quantizing weights and activations based on the concept of weighted entropy, which achieves significant reductions in both the model size and the amount of computation with minimal accuracy loss.