Training Binary Weight Networks via Semi-Binary Decomposition

@inproceedings{Hu2018TrainingBW,
  title={Training Binary Weight Networks via Semi-Binary Decomposition},
  author={Qinghao Hu and Gang Li and Peisong Wang and Yifan Zhang and Jian Cheng},
  booktitle={ECCV},
  year={2018}
}
Recently binary weight networks have attracted lots of attentions due to their high computational efficiency and small parameter size. Yet they still suffer from large accuracy drops because of their limited representation capacity. In this paper, we propose a novel semi-binary decomposition method which decomposes a matrix into two binary matrices and a diagonal matrix. Since the matrix product of binary matrices has more numerical values than binary matrix, the proposed semi-binary… Expand
Quantized Neural Networks via {-1, +1} Encoding Decomposition and Acceleration
TLDR
A novel encoding scheme using {−1,+1} to decompose quantized neural networks (QNNs) into multi-branch binary networks, which can be efficiently implemented by bitwise operations (i.e., xnor and bitcount) to achieve model compression, computational acceleration, and resource saving. Expand
Matrix and tensor decompositions for training binary neural networks
TLDR
This paper is on improving the training of binary neural networks in which both activations and weights are binary by parametrizing the weight tensor of each layer using matrix or tensor decomposition, and significantly outperforms existing methods when tested on the challenging tasks. Expand
Binary Neural Networks: A Survey
TLDR
A comprehensive survey of algorithms proposed for binary neural networks, mainly categorized into the native solutions directly conducting binarization, and the optimized ones using techniques like minimizing the quantization error, improving the network loss function, and reducing the gradient error are presented. Expand
Sparsity-Inducing Binarized Neural Networks
TLDR
This work proposes the Sparsity-inducing Binarized Neural Network (Si-BNN), to quantize the activations to be either 0 or +1, which introduces sparsity into binary representation, and introduces trainable thresholds into the backward function of binarization to guide the gradient propagation. Expand
Training Low Bitwidth Model with Weight Normalization for Convolutional Neural Networks
TLDR
A method to train convolutional neural networks with low bit width by performing weight normalization, which enables the low bitwidth network to achieve a good trade-off between range and precision. Expand
Improving the accuracy of SqueezeNet with negligible extra computational cost
TLDR
The network from the aspects of training method and network microarchitecture is improved to enhance the practicality of SqueezeNet at negligible extra computational cost and these two improvements significantly improve the performance of the SqueezNet. Expand
Hyperdrive: A Multi-Chip Systolically Scalable Binary-Weight CNN Inference Engine
TLDR
Hyperdrive is presented: a BWN accelerator dramatically reducing the I/O bandwidth exploiting a novel binary-weight streaming approach, which can be used for an arbitrarily sized convolutional neural network architecture and input resolution by exploiting the natural scalability of the compute units both at chip-level and system-level. Expand
Towards Accurate Post-training Network Quantization via Bit-Split and Stitching
TLDR
This paper proposes a Bit-Split and Stitching framework (Bit-split) for lower-bit post-training quantization with minimal accuracy degradation, which can achieve near-original model performance even when quantizing FP32 models to INT3 without fine-tuning. Expand
Towards energy-efficient convolutional neural network inference
TLDR
This thesis first evaluates the capabilities of off-the-shelf software-programmable hardware before diving into specialized hardware accelerators and exploring the potential of extremely quantized CNNs, and gives special consideration to external memory bandwidth. Expand
Learning from Binary Multiway Data: Probabilistic Tensor Decomposition and its Statistical Optimality
TLDR
A multilinear Bernoulli model is proposed, a rank-constrained likelihood-based estimation method is developed, and theoretical accuracy guarantees are obtained for the parameter tensor estimation. Expand
...
1
2
...

References

SHOWING 1-10 OF 31 REFERENCES
From Hashing to CNNs: Training BinaryWeight Networks via Hashing
TLDR
The strong connection between inner-product preserving hashing and binary weight networks can be intrinsically regarded as a hashing problem, and an alternating optimization method to learn the hash codes instead of directly learning binary weights is proposed. Expand
BinaryConnect: Training Deep Neural Networks with binary weights during propagations
TLDR
BinaryConnect is introduced, a method which consists in training a DNN with binary weights during the forward and backward propagations, while retaining precision of the stored weights in which gradients are accumulated, and near state-of-the-art results with BinaryConnect are obtained on the permutation-invariant MNIST, CIFAR-10 and SVHN. Expand
Compressing Deep Convolutional Networks using Vector Quantization
TLDR
This paper is able to achieve 16-24 times compression of the network with only 1% loss of classification accuracy using the state-of-the-art CNN, and finds in terms of compressing the most storage demanding dense connected layers, vector quantization methods have a clear gain over existing matrix factorization methods. Expand
Two-Step Quantization for Low-bit Neural Networks
TLDR
A simple yet effective Two-Step Quantization (TSQ) framework is proposed, by decomposing the network quantization problem into two steps: code learning and transformation function learning based on the learned codes, and the sparse quantization method for code learning. Expand
Speeding-up Convolutional Neural Networks Using Fine-tuned CP-Decomposition
TLDR
A simple two-step approach for speeding up convolution layers within large convolutional neural networks based on tensor decomposition and discriminative fine-tuning is proposed, leading to higher obtained CPU speedups at the cost of lower accuracy drops for the smaller of the two networks. Expand
Fixed Point Quantization of Deep Convolutional Networks
TLDR
This paper proposes a quantizer design for fixed point implementation of DCNs, formulate and solve an optimization problem to identify optimal fixed point bit-width allocation across DCN layers, and demonstrates that fine-tuning can further enhance the accuracy of fixed point DCNs beyond that of the original floating point model. Expand
Accelerating Convolutional Neural Networks for Mobile Applications
TLDR
An efficient and effective approach is proposed to accelerate the test-phase computation of CNNs based on low-rank and group sparse tensor decomposition, which achieves significant reduction in computational complexity, at the cost of negligible loss in accuracy. Expand
Learning Accurate Low-Bit Deep Neural Networks with Stochastic Quantization
TLDR
Stochastic quantization (SQ) algorithm for learning accurate low-bit DNNs quantizes a portion of elements/filters to low- bit with a stochastic probability inversely proportional to the quantization error, while keeping the other portion unchanged with full-precision. Expand
Compression of Deep Convolutional Neural Networks for Fast and Low Power Mobile Applications
TLDR
A simple and effective scheme to compress the entire CNN, called one-shot whole network compression, which addresses the important implementation level issue on 1?1 convolution, which is a key operation of inception module of GoogLeNet as well as CNNs compressed by the proposed scheme. Expand
Speeding up Convolutional Neural Networks with Low Rank Expansions
TLDR
Two simple schemes for drastically speeding up convolutional neural networks are presented, achieved by exploiting cross-channel or filter redundancy to construct a low rank basis of filters that are rank-1 in the spatial domain. Expand
...
1
2
3
4
...