# Training Binary Weight Networks via Semi-Binary Decomposition

@inproceedings{Hu2018TrainingBW, title={Training Binary Weight Networks via Semi-Binary Decomposition}, author={Qinghao Hu and Gang Li and Peisong Wang and Yifan Zhang and Jian Cheng}, booktitle={ECCV}, year={2018} }

Recently binary weight networks have attracted lots of attentions due to their high computational efficiency and small parameter size. Yet they still suffer from large accuracy drops because of their limited representation capacity. In this paper, we propose a novel semi-binary decomposition method which decomposes a matrix into two binary matrices and a diagonal matrix. Since the matrix product of binary matrices has more numerical values than binary matrix, the proposed semi-binary… Expand

#### 11 Citations

Quantized Neural Networks via {-1, +1} Encoding Decomposition and Acceleration

- Computer Science
- ArXiv
- 2021

A novel encoding scheme using {−1,+1} to decompose quantized neural networks (QNNs) into multi-branch binary networks, which can be efficiently implemented by bitwise operations (i.e., xnor and bitcount) to achieve model compression, computational acceleration, and resource saving. Expand

Matrix and tensor decompositions for training binary neural networks

- Computer Science
- ArXiv
- 2019

This paper is on improving the training of binary neural networks in which both activations and weights are binary by parametrizing the weight tensor of each layer using matrix or tensor decomposition, and significantly outperforms existing methods when tested on the challenging tasks. Expand

Binary Neural Networks: A Survey

- Computer Science
- Pattern Recognit.
- 2020

A comprehensive survey of algorithms proposed for binary neural networks, mainly categorized into the native solutions directly conducting binarization, and the optimized ones using techniques like minimizing the quantization error, improving the network loss function, and reducing the gradient error are presented. Expand

Sparsity-Inducing Binarized Neural Networks

- Computer Science
- AAAI
- 2020

This work proposes the Sparsity-inducing Binarized Neural Network (Si-BNN), to quantize the activations to be either 0 or +1, which introduces sparsity into binary representation, and introduces trainable thresholds into the backward function of binarization to guide the gradient propagation. Expand

Training Low Bitwidth Model with Weight Normalization for Convolutional Neural Networks

- Computer Science
- PRCV
- 2019

A method to train convolutional neural networks with low bit width by performing weight normalization, which enables the low bitwidth network to achieve a good trade-off between range and precision. Expand

Improving the accuracy of SqueezeNet with negligible extra computational cost

- Computer Science
- 2020 International Conference on High Performance Big Data and Intelligent Systems (HPBD&IS)
- 2020

The network from the aspects of training method and network microarchitecture is improved to enhance the practicality of SqueezeNet at negligible extra computational cost and these two improvements significantly improve the performance of the SqueezNet. Expand

Hyperdrive: A Multi-Chip Systolically Scalable Binary-Weight CNN Inference Engine

- Computer Science
- IEEE Journal on Emerging and Selected Topics in Circuits and Systems
- 2019

Hyperdrive is presented: a BWN accelerator dramatically reducing the I/O bandwidth exploiting a novel binary-weight streaming approach, which can be used for an arbitrarily sized convolutional neural network architecture and input resolution by exploiting the natural scalability of the compute units both at chip-level and system-level. Expand

Towards Accurate Post-training Network Quantization via Bit-Split and Stitching

- Computer Science
- ICML
- 2020

This paper proposes a Bit-Split and Stitching framework (Bit-split) for lower-bit post-training quantization with minimal accuracy degradation, which can achieve near-original model performance even when quantizing FP32 models to INT3 without fine-tuning. Expand

Towards energy-efficient convolutional neural network inference

- Computer Science
- 2019

This thesis first evaluates the capabilities of off-the-shelf software-programmable hardware before diving into specialized hardware accelerators and exploring the potential of extremely quantized CNNs, and gives special consideration to external memory bandwidth. Expand

Learning from Binary Multiway Data: Probabilistic Tensor Decomposition and its Statistical Optimality

- Computer Science, Medicine
- J. Mach. Learn. Res.
- 2020

A multilinear Bernoulli model is proposed, a rank-constrained likelihood-based estimation method is developed, and theoretical accuracy guarantees are obtained for the parameter tensor estimation. Expand

#### References

SHOWING 1-10 OF 31 REFERENCES

From Hashing to CNNs: Training BinaryWeight Networks via Hashing

- Computer Science
- AAAI
- 2018

The strong connection between inner-product preserving hashing and binary weight networks can be intrinsically regarded as a hashing problem, and an alternating optimization method to learn the hash codes instead of directly learning binary weights is proposed. Expand

BinaryConnect: Training Deep Neural Networks with binary weights during propagations

- Computer Science, Mathematics
- NIPS
- 2015

BinaryConnect is introduced, a method which consists in training a DNN with binary weights during the forward and backward propagations, while retaining precision of the stored weights in which gradients are accumulated, and near state-of-the-art results with BinaryConnect are obtained on the permutation-invariant MNIST, CIFAR-10 and SVHN. Expand

Compressing Deep Convolutional Networks using Vector Quantization

- Computer Science
- ArXiv
- 2014

This paper is able to achieve 16-24 times compression of the network with only 1% loss of classification accuracy using the state-of-the-art CNN, and finds in terms of compressing the most storage demanding dense connected layers, vector quantization methods have a clear gain over existing matrix factorization methods. Expand

Two-Step Quantization for Low-bit Neural Networks

- Computer Science
- 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition
- 2018

A simple yet effective Two-Step Quantization (TSQ) framework is proposed, by decomposing the network quantization problem into two steps: code learning and transformation function learning based on the learned codes, and the sparse quantization method for code learning. Expand

Speeding-up Convolutional Neural Networks Using Fine-tuned CP-Decomposition

- Computer Science
- ICLR
- 2015

A simple two-step approach for speeding up convolution layers within large convolutional neural networks based on tensor decomposition and discriminative fine-tuning is proposed, leading to higher obtained CPU speedups at the cost of lower accuracy drops for the smaller of the two networks. Expand

Fixed Point Quantization of Deep Convolutional Networks

- Computer Science, Mathematics
- ICML
- 2016

This paper proposes a quantizer design for fixed point implementation of DCNs, formulate and solve an optimization problem to identify optimal fixed point bit-width allocation across DCN layers, and demonstrates that fine-tuning can further enhance the accuracy of fixed point DCNs beyond that of the original floating point model. Expand

Accelerating Convolutional Neural Networks for Mobile Applications

- Computer Science
- ACM Multimedia
- 2016

An efficient and effective approach is proposed to accelerate the test-phase computation of CNNs based on low-rank and group sparse tensor decomposition, which achieves significant reduction in computational complexity, at the cost of negligible loss in accuracy. Expand

Learning Accurate Low-Bit Deep Neural Networks with Stochastic Quantization

- Computer Science
- BMVC
- 2017

Stochastic quantization (SQ) algorithm for learning accurate low-bit DNNs quantizes a portion of elements/filters to low- bit with a stochastic probability inversely proportional to the quantization error, while keeping the other portion unchanged with full-precision. Expand

Compression of Deep Convolutional Neural Networks for Fast and Low Power Mobile Applications

- Computer Science
- ICLR
- 2016

A simple and effective scheme to compress the entire CNN, called one-shot whole network compression, which addresses the important implementation level issue on 1?1 convolution, which is a key operation of inception module of GoogLeNet as well as CNNs compressed by the proposed scheme. Expand

Speeding up Convolutional Neural Networks with Low Rank Expansions

- Computer Science
- BMVC
- 2014

Two simple schemes for drastically speeding up convolutional neural networks are presented, achieved by exploiting cross-channel or filter redundancy to construct a low rank basis of filters that are rank-1 in the spatial domain. Expand