# CUTIE: Beyond PetaOp/s/W Ternary DNN Inference Acceleration With Better-Than-Binary Energy Efficiency

@article{Scherer2022CUTIEBP,
title={CUTIE: Beyond PetaOp/s/W Ternary DNN Inference Acceleration With Better-Than-Binary Energy Efficiency},
author={Moritz Scherer and Georg Rutishauser and Lukas Cavigelli and Luca Benini},
journal={IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems},
year={2022},
volume={41},
pages={1020-1033}
}
• Published 3 November 2020
• Computer Science
• IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
We present a 3.1 POp/s/W fully digital hardware accelerator for ternary neural networks (TNNs). CUTIE, the completely unrolled ternary inference engine, focuses on minimizing noncomputational energy and switching activity so that dynamic power spent on storing (locally or globally) intermediate results is minimized. This is achieved by: 1) a data-path architecture completely unrolled in the feature map and filter dimensions to reduce switching activity by favoring silencing over iterative…

## Figures and Tables from this paper

### Low- and Mixed-Precision Inference Accelerators

• Computer Science
ArXiv
• 2022
In this chapter, design choices and their implications on theibility and energy eﬃciency of several accelerators supporting extremely quantized networks are reviewed.

### Ternarized TCN for μJ/Inference Gesture Recognition from DVS Event Frames

• Computer Science
• 2022
This paper assembles ternary video frames from the event stream and process them with a fully ternarized Temporal Convolutional Network which can be mapped to CUTIE, a highly energy-efﬁcient Ternary Neural Network accelerator.

### How to train accurate BNNs for embedded systems?

• Computer Science
ArXiv
• 2022
This chapter contains an empirical review that evaluates the beneﬁts of many repair methods in isolation over the ResNet-20&CIFAR10 and Res net-18&CifAR100 benchmarks and found three repair categories most beneﷁcial: feature binarizer, feature normalization, and double residual.

### Ternarized TCN for $\mu \mathrm{J}/\text{Inference}$ Gesture Recognition from DVS Event Frames

• Computer Science
2022 Design, Automation & Test in Europe Conference & Exhibition (DATE)
• 2022
This paper assembles ternary video frames from the event stream and process them with a fully ternarized Temporal Convolutional Network which can be mapped to CUTIE, a highly energy-efficient Ternary Neural Network accelerator.

### Sub-mW Keyword Spotting on an MCU: Analog Binary Feature Extraction and Binary Neural Networks

• Computer Science
IEEE Transactions on Circuits and Systems I: Regular Papers
• 2022
This work addresses KWS energy-efficiency on low-cost microcontroller units (MCUs) with an analog front-end that combines analog binary feature extraction with binary neural networks and shows that the energy required for data acquisition and preprocessing can be reduced by inline-formula.

### Vau Da Muntanialas: Energy-Efficient Multi-Die Scalable Acceleration of RNN Inference

• Computer Science
IEEE Transactions on Circuits and Systems I: Regular Papers
• 2022
Muntaniala is presented, an RNN accelerator architecture for LSTM inference with a silicon-measured energy-efficiency of 3.25TOP/s/W and performance of 30.53GOP/s in UMC 65nm technology, and a phoneme error rate drop of approximately 3% with respect to floating-point on a 3L-384NH-123NI L STM network on the TIMIT dataset.

### A Construction Kit for Efficient Low Power Neural Network Accelerator Designs

• Computer Science
ACM Transactions on Embedded Computing Systems
• 2022
A quantitative overview of neural network accelerator optimization approaches that have been used in recent works and their individual effects on edge processing performance are reported, providing chip designers an overview of design choices for implementing efficient low power neural network accelerators.

## References

SHOWING 1-10 OF 68 REFERENCES

### ChewBaccaNN: A Flexible 223 TOPS/W BNN Accelerator

• Computer Science
2021 IEEE International Symposium on Circuits and Systems (ISCAS)
• 2021
ChewBaccaNN is presented, a 0.7 mm2 sized binary convolutional neural network (CNN) accelerator designed in GlobalFoundries 22 nm technology that can perform CIFAR-10 inference at 86.8% accuracy and perform inference on a binarized ResNet-18 trained with 8-bases Group-Net to achieve a 67.5% Top-1 accuracy.

### YodaNN: An Ultra-Low Power Convolutional Neural Network Accelerator Based on Binary Weights

• Computer Science
2016 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)
• 2016
A HW accelerator optimized for BinaryConnect CNNs that achieves 1510 GOp/s on a core area of only 1.33 MGE and with a power dissipation of 153 mW in UMC 65 nm technology at 1.2 V is presented.

### EBPC: Extended Bit-Plane Compression for Deep Neural Network Inference and Training Accelerators

• Computer Science
IEEE Journal on Emerging and Selected Topics in Circuits and Systems
• 2019
This work introduces and evaluates a novel, hardware-friendly, and lossless compression scheme for the feature maps present within convolutional neural networks, and achieves compression factors for gradient map compression during training that are even better than for inference.

### TiM-DNN: Ternary In-Memory Accelerator for Deep Neural Networks

• Computer Science
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
• 2020
TIM-DNN, a programmable in-memory accelerator that is specifically designed to execute ternary DNNs, is proposed and evaluated across a suite of state-of-the-art DNN benchmarks including both deep convolutional and recurrent neural networks.

### Hyperdrive: A Multi-Chip Systolically Scalable Binary-Weight CNN Inference Engine

• Computer Science
IEEE Journal on Emerging and Selected Topics in Circuits and Systems
• 2019
Hyperdrive is presented: a BWN accelerator dramatically reducing the I/O bandwidth exploiting a novel binary-weight streaming approach, which can be used for an arbitrarily sized convolutional neural network architecture and input resolution by exploiting the natural scalability of the compute units both at chip-level and system-level.

### Extended Bit-Plane Compression for Convolutional Neural Network Accelerators

• Computer Science
2019 IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS)
• 2019
This work introduces and evaluates a novel, hardware-friendly compression scheme for the feature maps present within convolutional neural networks and shows that an average compression ratio of 4.4× relative to uncompressed data and a gain of 60% over existing method can be achieved for ResNet-34 with a compression block requiring <300 bit of sequential cells and minimal combinational logic.

### A 64-Tile 2.4-Mb In-Memory-Computing CNN Accelerator Employing Charge-Domain Compute

• Computer Science
IEEE Journal of Solid-State Circuits
• 2019
This paper addresses data movement via an in-memory-computing accelerator that employs charged-domain mixed-signal operation for enhancing compute SNR and, thus, scalability in large-scale matrix-vector multiplications.

### BinarEye: An always-on energy-accuracy-scalable binary CNN processor with all memory on chip in 28nm CMOS

• Computer Science
2018 IEEE Custom Integrated Circuits Conference (CICC)
• 2018
BinarEye: the first digital processor for always-on Binary Convolutional Neural Networks maximizes data reuse through a Neuron Array exploiting local weight Flip-Flops and requires no off-chip bandwidth, which leads to a 230 lb-TOPS/W peak efficiency.

### BinaryConnect: Training Deep Neural Networks with binary weights during propagations

• Computer Science
NIPS
• 2015
BinaryConnect is introduced, a method which consists in training a DNN with binary weights during the forward and backward propagations, while retaining precision of the stored weights in which gradients are accumulated, and near state-of-the-art results with BinaryConnect are obtained on the permutation-invariant MNIST, CIFAR-10 and SVHN.