Corpus ID: 236034289

S2TA: Exploiting Structured Sparsity for Energy-Efficient Mobile CNN Acceleration

  title={S2TA: Exploiting Structured Sparsity for Energy-Efficient Mobile CNN Acceleration},
  author={Zhi-Gang Liu and Paul N. Whatmough and Yuhao Zhu and Matthew Mattina},
Exploiting sparsity is a key technique in accelerating quantized convolutional neural network (CNN) inference on mobile devices. Prior sparse CNN accelerators largely exploit unstructured sparsity and achieve significant speedups. Due to the unbounded, largely unpredictable sparsity patterns, however, exploiting unstructured sparsity requires complicated hardware design with significant energy and area overhead, which is particularly detrimental to mobile/IoT inference scenarios where energy… Expand


SCNN: An accelerator for compressed-sparse convolutional neural networks
The Sparse CNN (SCNN) accelerator architecture is introduced, which improves performance and energy efficiency by exploiting thezero-valued weights that stem from network pruning during training and zero-valued activations that arise from the common ReLU operator. Expand
Bit-Tactical: A Software/Hardware Approach to Exploiting Value and Bit Sparsity in Neural Networks
Two back-end designs chosen to target bit-sparsity in activations, rather than value- sparsity, are empirically motivate, with two benefits: a) they avoid handling the dynamically sparse whole-value activation stream, and b) they uncover more ineffectual work. Expand
SparTen: A Sparse Tensor Accelerator for Convolutional Neural Networks
SarTen is proposed which achieves efficient inner join in sparse CNNs by providing support for native two-sided sparse execution and memory storage and to tackle load imbalance, SparTen employs a software scheme, called greedy balancing, which groups filters by density via two variants. Expand
TensorDash: Exploiting Sparsity to Accelerate Deep Neural Network Training
TensorDash is a hardware-based technique that enables data-parallel MAC units to take advantage of sparsity in their input operand streams to speedup the training process while also increasing energy efficiency when used to compose a hardware accelerator for deep learning. Expand
Systolic Tensor Array: An Efficient Structured-Sparse GEMM Accelerator for Mobile CNN Inference
This letter generalizes the traditional scalar PE, into a Tensor-PE, which gives rise to a family of new Systolic Tensor Array (STA) microarchitectures, which increases intra-PE operand reuse and datapath efficiency, resulting in circuit area and power dissipation reduction. Expand
Sparse Tensor Core: Algorithm and Hardware Co-Design for Vector-wise Sparse Neural Networks on Modern GPUs
A novel pruning algorithm is devised to improve the workload balance and reduce the decoding overhead of the sparse neural networks and new instructions and micro-architecture optimization are proposed in Tensor Core to adapt to the structurally sparse Neural networks. Expand
Eyeriss v2: A Flexible Accelerator for Emerging Deep Neural Networks on Mobile Devices
Eyeriss v2, a DNN accelerator architecture designed for running compact and sparse DNNs, is presented, which introduces a highly flexible on-chip network that can adapt to the different amounts of data reuse and bandwidth requirements of different data types, which improves the utilization of the computation resources. Expand
EIE: Efficient Inference Engine on Compressed Deep Neural Network
  • Song Han, Xingyu Liu, +4 authors W. Dally
  • Computer Science
  • 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA)
  • 2016
An energy efficient inference engine (EIE) that performs inference on this compressed network model and accelerates the resulting sparse matrix-vector multiplication with weight sharing and is 189x and 13x faster when compared to CPU and GPU implementations of the same DNN without compression. Expand
SCALE-Sim: Systolic CNN Accelerator Simulator
This work introduces Systolic CNN Accelerator Simulator (SCALE-Sim), which is a configurable systolic array based cycle accurate DNN accelerator simulator that exposes various micro-architectural features as well as system integration parameters to the designer to enable comprehensive design space exploration. Expand
MASR: A Modular Accelerator for Sparse RNNs
  • Udit Gupta, Brandon Reagen, +5 authors D. Brooks
  • Computer Science, Engineering
  • 2019 28th International Conference on Parallel Architectures and Compilation Techniques (PACT)
  • 2019
MASR is a principled and modular architecture that accelerates bidirectional RNNs for on-chip ASR and is designed to exploit sparsity in both dynamic activations and static weights. Expand