• Corpus ID: 246063758

TerViT: An Efficient Ternary Vision Transformer

@article{Xu2022TerViTAE,
  title={TerViT: An Efficient Ternary Vision Transformer},
  author={Sheng Xu and Yanjing Li and Teli Ma and Bo-Wen Zeng and Baochang Zhang and Peng Gao and Jinhu Lv},
  journal={ArXiv},
  year={2022},
  volume={abs/2201.08050}
}
Vision transformers (ViTs) have demonstrated great potential in various visual tasks, but suffer from expensive computational and memory cost problems when deployed on resource-constrained devices. In this paper, we introduce a ternary vision transformer (TerViT) to ternarize the weights in ViTs, which are challenged by the large loss surface gap between real-valued and ternary parameters. To address the issue, we introduce a progressive training scheme by first training 8-bit transformers and… 

Figures and Tables from this paper

References

SHOWING 1-10 OF 26 REFERENCES

Swin Transformer: Hierarchical Vision Transformer using Shifted Windows

  • Ze LiuYutong Lin B. Guo
  • Computer Science
    2021 IEEE/CVF International Conference on Computer Vision (ICCV)
  • 2021
TLDR
A hierarchical Transformer whose representation is computed with Shifted windows, which has the flexibility to model at various scales and has linear computational complexity with respect to image size and will prove beneficial for all-MLP architectures.

Post-Training Quantization for Vision Transformer

TLDR
This paper presents an effective post-training quantization algorithm for reducing the memory storage and computational costs of vision transformers, and thoroughly analyzes the relationship between quantization loss of different layers and the feature diversity.

Training data-efficient image transformers & distillation through attention

TLDR
This work produces a competitive convolution-free transformer by training on Imagenet only, and introduces a teacher-student strategy specific to transformers that relies on a distillation token ensuring that the student learns from the teacher through attention.

TernaryBERT: Distillation-aware Ultra-low Bit BERT

TLDR
This work proposes TernaryBERT, which ternarizes the weights in a fine-tuned BERT model, and leverages the knowledge distillation technique in the training process to reduce the accuracy degradation caused by the lower capacity of low bits.

Fully Quantized Network for Object Detection

TLDR
This paper applies novel techniques to produce fully quantized 4-bit detectors based on RetinaNet and Faster R-CNN, and shows that these achieve state-of-the-art performance for quantized detectors.

ReActNet: Towards Precise Binary Neural Network with Generalized Activation Functions

TLDR
This paper proposes to generalize the traditional Sign and PReLU functions to enable explicit learning of the distribution reshape and shift at near-zero extra cost and shows that the proposed ReActNet outperforms all the state-of-the-arts by a large margin.

XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks

TLDR
The Binary-Weight-Network version of AlexNet is compared with recent network binarization methods, BinaryConnect and BinaryNets, and outperform these methods by large margins on ImageNet, more than \(16\,\%\) in top-1 accuracy.

End-to-End Object Detection with Transformers

TLDR
This work presents a new method that views object detection as a direct set prediction problem, and demonstrates accuracy and run-time performance on par with the well-established and highly-optimized Faster RCNN baseline on the challenging COCO object detection dataset.

Bi-Real Net: Enhancing the Performance of 1-bit CNNs With Improved Representational Capability and Advanced Training Algorithm

TLDR
A novel model, dubbed Bi-Real net, which connects the real activations (after the 1-bit convolution and/or BatchNorm layer, before the sign function) to activations of the consecutive block, through an identity shortcut is proposed, which achieves up to 10% higher top-1 accuracy with more memory saving and lower computational cost.

ImageNet classification with deep convolutional neural networks

TLDR
A large, deep convolutional neural network was trained to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes and employed a recently developed regularization method called "dropout" that proved to be very effective.