• Corpus ID: 239885955

Demystifying and Generalizing BinaryConnect

  title={Demystifying and Generalizing BinaryConnect},
  author={Tim Dockhorn and Yaoliang Yu and Eyyub Sari and Mahdi Zolnouri and V. Nia},
  booktitle={Neural Information Processing Systems},
BinaryConnect (BC) and its many variations have become the de facto standard for neural network quantization. However, our understanding of the inner workings of BC is still quite limited. We attempt to close this gap in four different aspects: (a) we show that existing quantization algorithms, including post-training quantization, are surprisingly similar to each other; (b) we argue for proximal maps as a natural family of quantizers that is both easy to design and analyze; (c) we refine the… 

SiMaN: Sign-to-Magnitude Network Binarization

It is shown that the weight binarization provides an analytical solution by encoding high-magnitude weights into +1s, and 0 s otherwise, and therefore, a high-quality discrete solution is established in a computationally efficient manner without the sign function.

Spartan: Differentiable Sparsity via Regularized Transportation

We present Spartan, a method for training sparse neural network models with a predetermined level of sparsity. Spartan is based on a combination of two techniques: (1) soft top- k masking of

Channel Pruning In Quantization-aware Training: An Adaptive Projection-gradient Descent-shrinkage-splitting Method

An adaptive projection-gradient descent- shrinkage-splitting method to integrate penalty based channel pruning into quantization-aware training (QAT) and a novel complementary transformed l 1 penalty to stabilize the training for extreme compression is proposed.



Mirror Descent View for Neural Network Quantization

By interpreting the continuous parameters (unconstrained) as the dual of the quantized ones, a Mirror Descent (MD) framework for NN quantization is introduced and conditions on the projections are provided which would enable us to derive valid mirror maps and in turn the respective MD updates.

Training Binary Neural Networks with Real-to-Binary Convolutions

This paper shows how to build a strong baseline, which already achieves state-of-the-art accuracy, by combining recently proposed advances, and carefully tuning the optimization procedure to minimize the discrepancy between the output of the binary and the corresponding real-valued convolution.

Binary Neural Networks: A Survey

BinaryRelax: A Relaxation Approach For Training Deep Neural Networks With Quantized Weights

BinaryRelax is proposed, a simple two-phase algorithm for training deep neural networks with quantized weights that relax the hard constraint into a continuous regularizer via Moreau envelope, which turns out to be the squared Euclidean distance to the set of quantization weights.

Incremental Network Quantization: Towards Lossless CNNs with Low-Precision Weights

Extensive experiments on the ImageNet classification task using almost all known deep CNN architectures including AlexNet, VGG-16, GoogleNet and ResNets well testify the efficacy of the proposed INQ, showing that at 5-bit quantization, models have improved accuracy than the 32-bit floating-point references.

MetaQuant: Learning to Quantize by Learning to Penetrate Non-differentiable Quantization

A meta network is trained using g_q and r as inputs, and outputs $g_r$ for subsequent weight updates, which alleviates the problem of non-differentiability, and can be trained in an end-to-end manner.

The High-Dimensional Geometry of Binary Neural Networks

This work explains why multilayer binary neural networks work in terms of the HD geometry and serves as a foundation for understanding not only BNNs but a variety of methods that seek to compress traditional neural networks.

Extremely Low Bit Neural Network: Squeeze the Last Bit Out with ADMM

This paper focuses on compressing and accelerating deep models with network weights represented by very small numbers of bits, referred to as extremely low bit neural network, and proposes to solve this problem using extragradient and iterative quantization algorithms that lead to considerably faster convergency compared to conventional optimization methods.

Alternating Multi-bit Quantization for Recurrent Neural Networks

This work quantizes the network, both weights and activations, into multiple binary codes {-1,+1}, and forms the quantization as an optimization problem, which in both RNNs and feedforward neural networks achieves excellent performance and is extended to image classification tasks.

Weighted-Entropy-Based Quantization for Deep Neural Networks

This paper proposes a novel method for quantizing weights and activations based on the concept of weighted entropy, which achieves significant reductions in both the model size and the amount of computation with minimal accuracy loss.