• Corpus ID: 239050150

Vis-TOP: Visual Transformer Overlay Processor

@article{Hu2021VisTOPVT,
  title={Vis-TOP: Visual Transformer Overlay Processor},
  author={Wei Hu and Dian Xu and Zimeng Fan and Fang Liu and Yanxiang He},
  journal={ArXiv},
  year={2021},
  volume={abs/2110.10957}
}
In recent years, Transformer [23] has achieved good results in Natural Language Processing (NLP) and has also started to expand into Computer Vision (CV). Excellent models such as the Vision Transformer [5] and Swin Transformer [17] have emerged. At the same time, the platform for Transformer models was extended to embedded devices to meet some resource-sensitive application scenarios. However, due to the large number of parameters, the complex computational flow and the many different… 
1 Citations

Figures and Tables from this paper

Row-wise Accelerator for Vision Transformer
TLDR
The hardware accelerator for vision transformers with row- wise scheduling is proposed, which decomposes major operations in visiontransformers as a single dot product primitive for a unified and efflcient execution.

References

SHOWING 1-10 OF 28 REFERENCES
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
  • Ze Liu, Yutong Lin, B. Guo
  • Computer Science
    2021 IEEE/CVF International Conference on Computer Vision (ICCV)
  • 2021
TLDR
A hierarchical Transformer whose representation is computed with Shifted windows, which has the flexibility to model at various scales and has linear computational complexity with respect to image size and will prove beneficial for all-MLP architectures.
Lite Transformer with Long-Short Range Attention
TLDR
This paper investigates the mobile setting for NLP tasks to facilitate the deployment on the edge devices and designs Lite Transformer, which demonstrates consistent improvement over the transformer on three well-established language tasks: machine translation, abstractive summarization, and language modeling.
Training data-efficient image transformers & distillation through attention
TLDR
This work produces a competitive convolution-free transformer by training on Imagenet only, and introduces a teacher-student strategy specific to transformers that relies on a distillation token ensuring that the student learns from the teacher through attention.
NPE: An FPGA-based Overlay Processor for Natural Language Processing
TLDR
NPE provides a cost-effective and power-efficient FPGA-based solution for Natural Language Processing at the edge and offers software-like programmability to the end user and can be upgraded for future NLP models without requiring reconfiguration.
Scalable Vision Transformers with Hierarchical Pooling
TLDR
A Hierarchical Visual Transformer (HVT) is proposed which progressively pools visual tokens to shrink the sequence length and hence reduces the computational cost, analogous to the feature maps downsampling in Convolutional Neural Networks (CNNs).
Attention is All you Need
TLDR
A new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely is proposed, which generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
TLDR
A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.
Hardware Accelerator for Multi-Head Attention and Position-Wise Feed-Forward in the Transformer
TLDR
This work proposes the first hardware accelerator for two key components, i.e., the multi-head attention (MHA) Res block and the position-wise feed-forward network (FFN) ResBlock, which are the two most complex layers in the Transformer.
FTRANS: energy-efficient acceleration of transformers using FPGA
TLDR
This paper proposes an efficient acceleration framework, Ftrans, for transformer-based large scale language representations, which includes enhanced block-circulant matrix (BCM)-based weight representation to enable model compression on large-scale language representations at the algorithm level with few accuracy degradation.
DLA: Compiler and FPGA Overlay for Neural Network Inference Acceleration
  • M. Abdelfattah, David Han, G. Chiu
  • Computer Science
    2018 28th International Conference on Field Programmable Logic and Applications (FPL)
  • 2018
TLDR
This paper introduces an overlay targeted for deep neural network inference with only ~1% overhead to support the control and reprogramming logic using a lightweight very-long instruction word (VLIW) network and implements a sophisticated domain specific graph compiler that compiles deep learning languages such as Caffe or Tensorflow to easily target this overlay.
...
1
2
3
...