Neural Networks Weights Quantization: Target None-retraining Ternary (TNT)

  title={Neural Networks Weights Quantization: Target None-retraining Ternary (TNT)},
  author={Tianyu Zhang and Lei Zhu and Qian Zhao and Kilho Shin},
  journal={2019 Fifth Workshop on Energy Efficient Machine Learning and Cognitive Computing - NeurIPS Edition (EMC2-NIPS)},
  • Tianyu Zhang, Lei Zhu, +1 author Kilho Shin
  • Published 1 December 2019
  • Computer Science
  • 2019 Fifth Workshop on Energy Efficient Machine Learning and Cognitive Computing - NeurIPS Edition (EMC2-NIPS)
Quantization of weights of deep neural networks (DNN) has proven to be an effective solution for the purpose of implementing DNNs on edge devices such as mobiles, ASICs and FPGAs, because they have no sufficient resources to support computation involving millions of high precision weights and multiply-accumulate operations. This paper proposes a novel method to compress vectors of high precision weights of DNNs to ternary vectors, namely a cosine similarity based target non-retraining ternary… 
EasyQuant: Post-training Quantization via Scale Optimization
This paper presents an efficient and simple post-training method via scale optimization, named EasyQuant (EQ), that could obtain comparable accuracy with the training-based method, and shows that EQ outperforms the TensorRT method and can achieve near INT8 accuracy in 7 bits width post- training.
One Loss for All: Deep Hashing with a Single Cosine Similarity based Learning Objective
It is shown that maximizing the cosine similarity between the continuous codes and their corresponding binary orthogonal codes can ensure both hash code discriminativeness and quantization error minimization, leading to an one-loss deep hashing model that removes all the hassles of tuning the weights of various losses.
Ternary Hashing
This work demonstrates that the proposed ternary hashing is compared favorably to the binary hashing methods with consistent improvements of retrieval mean average precision (mAP) ranging from 1% to 5.9% as shown in CIFAR10, NUS-WIDE and ImageNet100 datasets.


Ternary Neural Networks with Fine-Grained Quantization
A novel fine-grained quantization (FGQ) method to ternarize pre-trained full precision models, while also constraining activations to 8 and 4-bits is proposed, which enables a full 8/4-bit inference pipeline, with best-reported accuracy using ternary weights on ImageNet dataset.
Ternary neural networks for resource-efficient AI applications
This paper proposes ternary neural networks (TNNs) in order to make deep learning more resource-efficient, and designs a purpose-built hardware architecture for TNNs and implements it on FPGA and ASIC.
BinaryConnect: Training Deep Neural Networks with binary weights during propagations
BinaryConnect is introduced, a method which consists in training a DNN with binary weights during the forward and backward propagations, while retaining precision of the stored weights in which gradients are accumulated, and near state-of-the-art results with BinaryConnect are obtained on the permutation-invariant MNIST, CIFAR-10 and SVHN.
XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks
The Binary-Weight-Network version of AlexNet is compared with recent network binarization methods, BinaryConnect and BinaryNets, and outperform these methods by large margins on ImageNet, more than \(16\,\%\) in top-1 accuracy.
Binarized Neural Networks
A binary matrix multiplication GPU kernel is written with which it is possible to run the MNIST BNN 7 times faster than with an unoptimized GPU kernel, without suffering any loss in classification accuracy.
Ternary Weight Networks
  • Fengfu Li, Bin Liu
  • Mathematics, Computer Science
  • 2016
TWNs are introduced - neural networks with weights constrained to +1, 0 and -1, which have stronger expressive abilities than the recently proposed binary precision counterparts and are thus more effective than the latter.
Very Deep Convolutional Networks for Large-Scale Image Recognition
This work investigates the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting using an architecture with very small convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers.