XNOR Neural Engine: A Hardware Accelerator IP for 21.6-fJ/op Binary Neural Network Inference
@article{Conti2018XNORNE, title={XNOR Neural Engine: A Hardware Accelerator IP for 21.6-fJ/op Binary Neural Network Inference}, author={Francesco Conti and Pasquale Davide Schiavone and Luca Benini}, journal={IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems}, year={2018}, volume={37}, pages={2940-2951} }
Binary neural networks (BNNs) are promising to deliver accuracy comparable to conventional deep neural networks at a fraction of the cost in terms of memory and energy. [] Key Result We show post-synthesis results in 65- and 22-nm technology for the XNE IP and post-layout results in 22 nm for the full MCU indicating that this system can drop the energy cost per binary operation to 21.6 fJ per operation at 0.4 V, and at the same time is flexible and performant enough to execute state-of-the-art BNN topologies…
Figures and Tables from this paper
73 Citations
A Multi-Precision Bit-Serial Hardware Accelerator IP for Deep Learning Enabled Internet-of-Things
- Computer Science2021 IEEE International Midwest Symposium on Circuits and Systems (MWSCAS)
- 2021
The Serial-MAC-engine (SMAC-engine), a fully-digital hardware accelerator for inference of quantized DNNs suitable for integration in a heterogeneous System-on-Chip (SoC), is introduced.
BrainTTA: A 35 fJ/op Compiler Programmable Mixed-Precision Transport-Triggered NN SoC
- Computer ScienceArXiv
- 2022
A programmable SoC with mixed-precision support based on a Transport-Triggered Archi- tecture (TTA) that has a peak energy of 35/67/405 fJ/op (binary, ternary, and 8-bit precision) and a throughput of 614/307/77 GOPS, which is unprecedented for a programmable architecture.
Exploiting Processing in Non-Volatile Memory for Binary Neural Network Accelerators
- Computer ScienceArXiv
- 2018
This paper introduces a spintronic, reconfigurable in-memory accelerator for binary neural networks, NV-Net, which is capable of being used as a standard STT-MRAM array and a computational substrate simultaneously and allows for massively parallel and energy efficient computation.
Reconfigurable Binary Neural Network Accelerator with Adaptive Parallelism Scheme
- Computer ScienceElectronics
- 2021
The designed BNN accelerator is able to fully compute all types of BNN layers thanks to its reconfigurability, and it can achieve a higher area–speed efficiency than existing accelerators.
ChewBaccaNN: A Flexible 223 TOPS/W BNN Accelerator
- Computer Science2021 IEEE International Symposium on Circuits and Systems (ISCAS)
- 2021
ChewBaccaNN is presented, a 0.7 mm2 sized binary convolutional neural network (CNN) accelerator designed in GlobalFoundries 22 nm technology that can perform CIFAR-10 inference at 86.8% accuracy and perform inference on a binarized ResNet-18 trained with 8-bases Group-Net to achieve a 67.5% Top-1 accuracy.
Hardware Approximate Techniques for Deep Neural Network Accelerators: A Survey
- Computer ScienceACM Comput. Surv.
- 2023
This article provides a comprehensive survey and analysis of hardware approximation techniques for DNN accelerators and presents how Approximate Computing for Dnn accelerators can go beyond energy efficiency and address reliability and security issues as well.
Design framework for an energy-efficient binary convolutional neural network accelerator based on nonvolatile logic
- Computer ScienceNonlinear Theory and Its Applications, IEICE
- 2021
A design framework for an energy-efficient BCNN accelerator based on nonvolatile logic is presented and a new design can be realized for accelerators that is different from that of conventional accelerators based solely on CMOS.
A Resource-Efficient Inference Accelerator for Binary Convolutional Neural Networks
- Computer ScienceIEEE Transactions on Circuits and Systems II: Express Briefs
- 2021
This brief presents a novel architecture to implement a resource-efficient inference accelerator for binary convolutional neural networks (BCNN). The proposed architecture consistently processes each…
PPAC: A Versatile In-Memory Accelerator for Matrix-Vector-Product-Like Operations
- Computer Science2019 IEEE 30th International Conference on Application-specific Systems, Architectures and Processors (ASAP)
- 2019
The Parallel Processor in Associative Content-addressable memory (PPAC), a novel in-memory accelerator that supports a range of matrix-vector-product (MVP)-like operations that find use in traditional and emerging applications, is proposed.
On the Resilience of Deep Learning for Reduced-voltage FPGAs
- Computer Science2020 28th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP)
- 2020
It is found that modern FPGAs are robust enough in extremely low-voltage levels and that low- voltage related faults can be automatically masked within the training iterations, so there is no need for costly software- or hardware-oriented fault mitigation techniques like ECC.
References
SHOWING 1-10 OF 44 REFERENCES
XNORBIN: A 95 TOp/s/W hardware accelerator for binary convolutional neural networks
- Computer Science2018 IEEE Symposium in Low-Power and High-Speed Chips (COOL CHIPS)
- 2018
XNORBIN is presented, a flexible accelerator for binary CNNs with computation tightly coupled to memory for aggressive data reuse supporting even non-trivial network topologies with large feature map volumes.
BRein Memory: A Single-Chip Binary/Ternary Reconfigurable in-Memory Deep Neural Network Accelerator Achieving 1.4 TOPS at 0.6 W
- Computer ScienceIEEE Journal of Solid-State Circuits
- 2018
In-memory neural network processing without any external data accesses, sustained by the symmetry and simplicity of the computation of the binary/ternaty neural network, improves the energy efficiency dramatically.
A 65nm 4Kb algorithm-dependent computing-in-memory SRAM unit-macro with 2.3ns and 55.8TOPS/W fully parallel product-sum operation for binary DNN edge processors
- Computer Science2018 IEEE International Solid - State Circuits Conference - (ISSCC)
- 2018
This work implemented a 65nm 4Kb algorithm-dependent CIM-SRAM unit-macro and in-house binary DNN structure, for cost-aware DNN AI edge processors, and resulted in the first binary-based CIM -SRAM macro with the fastest PS operation, and the highest energy-efficiency among reported CIM macros.
YodaNN: An Architecture for Ultralow Power Binary-Weight CNN Acceleration
- Computer ScienceIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
- 2018
This paper presents an accelerator optimized for binary-weight CNNs that significantly outperforms the state-of-the-art in terms of energy and area efficiency and removes the need for expensive multiplications, as well as reducing I/O bandwidth and storage.
NullHop: A Flexible Convolutional Neural Network Accelerator Based on Sparse Representations of Feature Maps
- Computer ScienceIEEE Transactions on Neural Networks and Learning Systems
- 2019
A flexible and efficient CNN accelerator architecture called NullHop that implements SOA CNNs useful for low-power and low-latency application scenarios and exploits the sparsity of neuron activations in CNNs to accelerate the computation and reduce memory requirements is proposed.
XNOR-POP: A processing-in-memory architecture for binary Convolutional Neural Networks in Wide-IO2 DRAMs
- Computer Science2017 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED)
- 2017
A novel process-in-memory architecture to process emerging binary CNN tests in Wide-IO2 DRAMs is presented, which improves CNN test performance by 4× ∼ 11× with small hardware and power overhead.
A Heterogeneous Multicore System on Chip for Energy Efficient Brain Inspired Computing
- Computer ScienceIEEE Transactions on Circuits and Systems II: Express Briefs
- 2018
Mia Wallace is presented, a 65-nm system-on-chip integrating a near-threshold parallel processor cluster tightly coupled with a CNN accelerator that achieves peak energy efficiency of 108 GMAC/s/W at 0.72 V and peak performance of 14 GMac/s at 1.2 V.
FINN: A Framework for Fast, Scalable Binarized Neural Network Inference
- Computer ScienceFPGA
- 2017
FINN, a framework for building fast and flexible FPGA accelerators using a flexible heterogeneous streaming architecture that implements fully connected, convolutional and pooling layers, with per-layer compute resources being tailored to user-provided throughput requirements is presented.
Design Automation for Binarized Neural Networks: A Quantum Leap Opportunity?
- Computer Science2018 IEEE International Symposium on Circuits and Systems (ISCAS)
- 2018
The hardware design and synthesis of a purely combinational BNN for ultra-low power near-sensor processing, which features a 10× higher energy efficiency and major opportunities raised by BNN models.
An always-on 3.8μJ/86% CIFAR-10 mixed-signal binary CNN processor with all memory on chip in 28nm CMOS
- Computer Science2018 IEEE International Solid - State Circuits Conference - (ISSCC)
- 2018
A mixed-signal binary CNN processor that performs image classification of moderate complexity and employs near-memory computing to achieve a classification energy of 3.8μJ, a 40x improvement over TrueNorth.