TensorLib: A Spatial Accelerator Generation Framework for Tensor Algebra

@article{Jia2021TensorLibAS,
  title={TensorLib: A Spatial Accelerator Generation Framework for Tensor Algebra},
  author={Liancheng Jia and Zizhang Luo and Liqiang Lu and Yun Liang},
  journal={2021 58th ACM/IEEE Design Automation Conference (DAC)},
  year={2021},
  pages={865-870}
}
Tensor algebra finds applications in various domains, and these applications, especially when accelerated on spatial hardware accelerators, can deliver high performance and low power. Spatial hardware accelerator exhibits complex design space. Prior approaches based on manual implementation lead to low programming productivity, rendering thorough design space exploration impossible. In this paper, we propose TensorLib, a framework for generating spatial hardware accelerator for tensor algebra… 

Figures and Tables from this paper

HASCO: Towards Agile HArdware and Software CO-design for Tensor Computation
TLDR
This work proposes an agile co-design approach HASCO that provides an efficient HW/SW solution to dense tensor computation and develops a multi-objective Bayesian optimization algorithm to explore hardware optimization.
TENET: A Framework for Modeling Tensor Dataflow Based on Relation-centric Notation
TLDR
A relation-centric notation is introduced, which formally describes the hardware dataflow for tensor computation, which is more expressive than the compute-centric and data-centric notations by using more sophisticated affine transformations and inherently supports accurate metrics estimation, including data reuse, bandwidth, latency, and energy.
Towards Agile DNN Accelerator Design Using Incremental Synthesis on FPGAs
TLDR
An incremental synthesis framework Acoda is proposed to rapidly design DNN accelerators on FPGAs based on the observation that most revisions to DNNs are minor and local, and reuses existing hardware modules and incrementally modifies the accelerator.

References

SHOWING 1-10 OF 19 REFERENCES
T2S-Tensor: Productively Generating High-Performance Spatial Hardware for Dense Tensor Computations
We present a language and compilation framework for productively generating high-performance systolic arrays for dense tensor kernels on spatial architectures, including FPGAs and CGRAs. It decouples
HASCO: Towards Agile HArdware and Software CO-design for Tensor Computation
TLDR
This work proposes an agile co-design approach HASCO that provides an efficient HW/SW solution to dense tensor computation and develops a multi-objective Bayesian optimization algorithm to explore hardware optimization.
Generating Systolic Array Accelerators With Reusable Blocks
TLDR
This article analyzes the systolic array design space, and identifies the common structures of different syStolic dataflows, and builds hardware module templates using Chisel infrastructure, which can be reused for different dataflows and computation algorithms.
Gemmini: An Agile Systolic Array Generator Enabling Systematic Evaluations of Deep-Learning Architectures
TLDR
Gemmini is presented -- an open source and agile systolic array generator enabling systematic evaluations of deep-learning architectures and achieves two to three orders of magnitude speedup in deep neural network inference compared to the baseline execution on a host processor.
TENET: A Framework for Modeling Tensor Dataflow Based on Relation-centric Notation
TLDR
A relation-centric notation is introduced, which formally describes the hardware dataflow for tensor computation, which is more expressive than the compute-centric and data-centric notations by using more sophisticated affine transformations and inherently supports accurate metrics estimation, including data reuse, bandwidth, latency, and energy.
HeteroCL: A Multi-Paradigm Programming Infrastructure for Software-Defined Reconfigurable Computing
TLDR
Experimental results show that HeteroCL allows programmers to explore the design space efficiently in both performance and accuracy by combining different types of hardware customization and targeting spatial architectures, while keeping the algorithm code intact.
AutoSA: A Polyhedral Compiler for High-Performance Systolic Arrays on FPGA
TLDR
This work presents AutoSA, an end-to-end compilation framework for generating systolic arrays on FPGA, based on the polyhedral framework, and further incorporates a set of optimizations on different dimensions to boost performance.
In-datacenter performance analysis of a tensor processing unit
  • N. Jouppi, C. Young, D. Yoon
  • Computer Science
    2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA)
  • 2017
TLDR
This paper evaluates a custom ASIC-called a Tensor Processing Unit (TPU)-deployed in datacenters since 2015 that accelerates the inference phase of neural networks (NN) and compares it to a server-class Intel Haswell CPU and an Nvidia K80 GPU, which are contemporaries deployed in the samedatacenters.
Automated systolic array architecture synthesis for high throughput CNN inference on FPGAs
TLDR
This paper implements CNN on an FPGA using a systolic array architecture, which can achieve high clock frequency under high resource utilization, and provides an analytical model for performance and resource utilization and develops an automatic design space exploration framework.
PolySA: Polyhedral-Based Systolic Array Auto-Compilation
  • J. Cong, Jie Wang
  • Computer Science
    2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)
  • 2018
TLDR
PolySA is the first fully automated compilation framework for generating high-performance systolic array architectures on the FPGA leveraging recent advances in high-level synthesis and is able to generate optimal designs within one hour with performance comparable to state-of-the-art manual designs.
...
1
2
...