CUTIE: Beyond PetaOp/s/W Ternary DNN Inference Acceleration With Better-Than-Binary Energy Efficiency

@article{Scherer2022CUTIEBP,
  title={CUTIE: Beyond PetaOp/s/W Ternary DNN Inference Acceleration With Better-Than-Binary Energy Efficiency},
  author={Moritz Scherer and Georg Rutishauser and Lukas Cavigelli and Luca Benini},
  journal={IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems},
  year={2022},
  volume={41},
  pages={1020-1033}
}
We present a 3.1 POp/s/W fully digital hardware accelerator for ternary neural networks (TNNs). CUTIE, the completely unrolled ternary inference engine, focuses on minimizing noncomputational energy and switching activity so that dynamic power spent on storing (locally or globally) intermediate results is minimized. This is achieved by: 1) a data-path architecture completely unrolled in the feature map and filter dimensions to reduce switching activity by favoring silencing over iterative… 

Low- and Mixed-Precision Inference Accelerators

TLDR
In this chapter, design choices and their implications on theibility and energy efficiency of several accelerators supporting extremely quantized networks are reviewed.

Ternarized TCN for μJ/Inference Gesture Recognition from DVS Event Frames

TLDR
This paper assembles ternary video frames from the event stream and process them with a fully ternarized Temporal Convolutional Network which can be mapped to CUTIE, a highly energy-efficient Ternary Neural Network accelerator.

How to train accurate BNNs for embedded systems?

TLDR
This chapter contains an empirical review that evaluates the benefits of many repair methods in isolation over the ResNet-20&CIFAR10 and Res net-18&CifAR100 benchmarks and found three repair categories most beneﷁcial: feature binarizer, feature normalization, and double residual.

Ternarized TCN for $\mu \mathrm{J}/\text{Inference}$ Gesture Recognition from DVS Event Frames

TLDR
This paper assembles ternary video frames from the event stream and process them with a fully ternarized Temporal Convolutional Network which can be mapped to CUTIE, a highly energy-efficient Ternary Neural Network accelerator.

Sub-mW Keyword Spotting on an MCU: Analog Binary Feature Extraction and Binary Neural Networks

TLDR
This work addresses KWS energy-efficiency on low-cost microcontroller units (MCUs) with an analog front-end that combines analog binary feature extraction with binary neural networks and shows that the energy required for data acquisition and preprocessing can be reduced by inline-formula.

Vau Da Muntanialas: Energy-Efficient Multi-Die Scalable Acceleration of RNN Inference

TLDR
Muntaniala is presented, an RNN accelerator architecture for LSTM inference with a silicon-measured energy-efficiency of 3.25TOP/s/W and performance of 30.53GOP/s in UMC 65nm technology, and a phoneme error rate drop of approximately 3% with respect to floating-point on a 3L-384NH-123NI L STM network on the TIMIT dataset.

A Construction Kit for Efficient Low Power Neural Network Accelerator Designs

TLDR
A quantitative overview of neural network accelerator optimization approaches that have been used in recent works and their individual effects on edge processing performance are reported, providing chip designers an overview of design choices for implementing efficient low power neural network accelerators.

References

SHOWING 1-10 OF 68 REFERENCES

ChewBaccaNN: A Flexible 223 TOPS/W BNN Accelerator

TLDR
ChewBaccaNN is presented, a 0.7 mm2 sized binary convolutional neural network (CNN) accelerator designed in GlobalFoundries 22 nm technology that can perform CIFAR-10 inference at 86.8% accuracy and perform inference on a binarized ResNet-18 trained with 8-bases Group-Net to achieve a 67.5% Top-1 accuracy.

YodaNN: An Ultra-Low Power Convolutional Neural Network Accelerator Based on Binary Weights

TLDR
A HW accelerator optimized for BinaryConnect CNNs that achieves 1510 GOp/s on a core area of only 1.33 MGE and with a power dissipation of 153 mW in UMC 65 nm technology at 1.2 V is presented.

EBPC: Extended Bit-Plane Compression for Deep Neural Network Inference and Training Accelerators

TLDR
This work introduces and evaluates a novel, hardware-friendly, and lossless compression scheme for the feature maps present within convolutional neural networks, and achieves compression factors for gradient map compression during training that are even better than for inference.

TiM-DNN: Ternary In-Memory Accelerator for Deep Neural Networks

TLDR
TIM-DNN, a programmable in-memory accelerator that is specifically designed to execute ternary DNNs, is proposed and evaluated across a suite of state-of-the-art DNN benchmarks including both deep convolutional and recurrent neural networks.

Hyperdrive: A Multi-Chip Systolically Scalable Binary-Weight CNN Inference Engine

TLDR
Hyperdrive is presented: a BWN accelerator dramatically reducing the I/O bandwidth exploiting a novel binary-weight streaming approach, which can be used for an arbitrarily sized convolutional neural network architecture and input resolution by exploiting the natural scalability of the compute units both at chip-level and system-level.

Extended Bit-Plane Compression for Convolutional Neural Network Accelerators

  • L. CavigelliL. Benini
  • Computer Science
    2019 IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS)
  • 2019
TLDR
This work introduces and evaluates a novel, hardware-friendly compression scheme for the feature maps present within convolutional neural networks and shows that an average compression ratio of 4.4× relative to uncompressed data and a gain of 60% over existing method can be achieved for ResNet-34 with a compression block requiring <300 bit of sequential cells and minimal combinational logic.

A 64-Tile 2.4-Mb In-Memory-Computing CNN Accelerator Employing Charge-Domain Compute

TLDR
This paper addresses data movement via an in-memory-computing accelerator that employs charged-domain mixed-signal operation for enhancing compute SNR and, thus, scalability in large-scale matrix-vector multiplications.

BinarEye: An always-on energy-accuracy-scalable binary CNN processor with all memory on chip in 28nm CMOS

TLDR
BinarEye: the first digital processor for always-on Binary Convolutional Neural Networks maximizes data reuse through a Neuron Array exploiting local weight Flip-Flops and requires no off-chip bandwidth, which leads to a 230 lb-TOPS/W peak efficiency.

BinaryConnect: Training Deep Neural Networks with binary weights during propagations

TLDR
BinaryConnect is introduced, a method which consists in training a DNN with binary weights during the forward and backward propagations, while retaining precision of the stored weights in which gradients are accumulated, and near state-of-the-art results with BinaryConnect are obtained on the permutation-invariant MNIST, CIFAR-10 and SVHN.
...