FQ-ViT: Post-Training Quantization for Fully Quantized Vision Transformer
@inproceedings{Lin2022FQViTPQ, title={FQ-ViT: Post-Training Quantization for Fully Quantized Vision Transformer}, author={Yang Lin and Tianyu Zhang and Peiqin Sun and Zheng Li and Shuchang Zhou}, booktitle={IJCAI}, year={2022} }
Network quantization significantly reduces model inference complexity and has been widely used in real-world deployments. However, most existing quantization methods have been developed mainly on Convolutional Neural Networks (CNNs), and suffer severe degradation when applied to fully quantized vision transformers. In this work, we demonstrate that many of these difficulties arise because of serious inter-channel variation in LayerNorm inputs, and present, Power-of-Two Factor (PTF), a…
Figures and Tables from this paper
References
SHOWING 1-10 OF 28 REFERENCES
Post-Training Quantization for Vision Transformer
- Computer ScienceNeurIPS
- 2021
This paper presents an effective post-training quantization algorithm for reducing the memory storage and computational costs of vision transformers, and thoroughly analyzes the relationship between quantization loss of different layers and the feature diversity.
Fully Quantized Network for Object Detection
- Computer Science2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
- 2019
This paper applies novel techniques to produce fully quantized 4-bit detectors based on RetinaNet and Faster R-CNN, and shows that these achieve state-of-the-art performance for quantized detectors.
Towards Accurate Post-training Network Quantization via Bit-Split and Stitching
- Computer ScienceICML
- 2020
This paper proposes a Bit-Split and Stitching framework (Bit-split) for lower-bit post-training quantization with minimal accuracy degradation, which can achieve near-original model performance even when quantizing FP32 models to INT3 without fine-tuning.
Evo-ViT: Slow-Fast Token Evolution for Dynamic Vision Transformer
- Computer ScienceAAAI
- 2022
Evo-ViT is presented, a self-motivated slow-fast token evolution approach for vision transformers that can accelerate vanilla transformers of both flat and deep-narrow structures from the very beginning of the training process.
Training data-efficient image transformers & distillation through attention
- Computer ScienceICML
- 2021
This work produces a competitive convolution-free transformer by training on Imagenet only, and introduces a teacher-student strategy specific to transformers that relies on a distillation token ensuring that the student learns from the teacher through attention.
Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding
- Computer ScienceICLR
- 2016
This work introduces "deep compression", a three stage pipeline: pruning, trained quantization and Huffman coding, that work together to reduce the storage requirement of neural networks by 35x to 49x without affecting their accuracy.
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
- Computer ScienceICLR
- 2021
Vision Transformer (ViT) attains excellent results compared to state-of-the-art convolutional networks while requiring substantially fewer computational resources to train.
LeViT: a Vision Transformer in ConvNet’s Clothing for Faster Inference
- Computer Science2021 IEEE/CVF International Conference on Computer Vision (ICCV)
- 2021
This work designs a family of image classification architectures that optimize the trade-off between accuracy and efficiency in a high-speed regime and proposes LeViT, a hybrid neural network for fast inference image classification that significantly outperforms existing convnets and vision transformers.
I-BERT: Integer-only BERT Quantization
- Computer ScienceICML
- 2021
This work proposes a novel integer-only quantization scheme for Transformer based models that quantizes the entire inference process, and demonstrates how to approximate nonlinear operations in Transformer architectures, e.g., GELU, Softmax, and Layer Normalization, with lightweight integer computations.
A Deep Look into Logarithmic Quantization of Model Parameters in Neural Networks
- Computer ScienceIAIT 2018
- 2018
This paper proposes a new logarithmic quantization algorithm to mitigate the deterioration on neural networks which contain layers of small size and achieves the minimum accuracy loss on GoogLeNet after direct quantization compared to quantized counterparts.