XNOR Neural Engine: A Hardware Accelerator IP for 21.6-fJ/op Binary Neural Network Inference

@article{Conti2018XNORNE,
  title={XNOR Neural Engine: A Hardware Accelerator IP for 21.6-fJ/op Binary Neural Network Inference},
  author={Francesco Conti and Pasquale Davide Schiavone and Luca Benini},
  journal={IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems},
  year={2018},
  volume={37},
  pages={2940-2951}
}
Binary neural networks (BNNs) are promising to deliver accuracy comparable to conventional deep neural networks at a fraction of the cost in terms of memory and energy. [] Key Result We show post-synthesis results in 65- and 22-nm technology for the XNE IP and post-layout results in 22 nm for the full MCU indicating that this system can drop the energy cost per binary operation to 21.6 fJ per operation at 0.4 V, and at the same time is flexible and performant enough to execute state-of-the-art BNN topologies…

Figures and Tables from this paper

A Multi-Precision Bit-Serial Hardware Accelerator IP for Deep Learning Enabled Internet-of-Things

The Serial-MAC-engine (SMAC-engine), a fully-digital hardware accelerator for inference of quantized DNNs suitable for integration in a heterogeneous System-on-Chip (SoC), is introduced.

BrainTTA: A 35 fJ/op Compiler Programmable Mixed-Precision Transport-Triggered NN SoC

A programmable SoC with mixed-precision support based on a Transport-Triggered Archi- tecture (TTA) that has a peak energy of 35/67/405 fJ/op (binary, ternary, and 8-bit precision) and a throughput of 614/307/77 GOPS, which is unprecedented for a programmable architecture.

Exploiting Processing in Non-Volatile Memory for Binary Neural Network Accelerators

This paper introduces a spintronic, reconfigurable in-memory accelerator for binary neural networks, NV-Net, which is capable of being used as a standard STT-MRAM array and a computational substrate simultaneously and allows for massively parallel and energy efficient computation.

Reconfigurable Binary Neural Network Accelerator with Adaptive Parallelism Scheme

The designed BNN accelerator is able to fully compute all types of BNN layers thanks to its reconfigurability, and it can achieve a higher area–speed efficiency than existing accelerators.

ChewBaccaNN: A Flexible 223 TOPS/W BNN Accelerator

ChewBaccaNN is presented, a 0.7 mm2 sized binary convolutional neural network (CNN) accelerator designed in GlobalFoundries 22 nm technology that can perform CIFAR-10 inference at 86.8% accuracy and perform inference on a binarized ResNet-18 trained with 8-bases Group-Net to achieve a 67.5% Top-1 accuracy.

Hardware Approximate Techniques for Deep Neural Network Accelerators: A Survey

This article provides a comprehensive survey and analysis of hardware approximation techniques for DNN accelerators and presents how Approximate Computing for Dnn accelerators can go beyond energy efficiency and address reliability and security issues as well.

Design framework for an energy-efficient binary convolutional neural network accelerator based on nonvolatile logic

A design framework for an energy-efficient BCNN accelerator based on nonvolatile logic is presented and a new design can be realized for accelerators that is different from that of conventional accelerators based solely on CMOS.

A Resource-Efficient Inference Accelerator for Binary Convolutional Neural Networks

This brief presents a novel architecture to implement a resource-efficient inference accelerator for binary convolutional neural networks (BCNN). The proposed architecture consistently processes each

PPAC: A Versatile In-Memory Accelerator for Matrix-Vector-Product-Like Operations

The Parallel Processor in Associative Content-addressable memory (PPAC), a novel in-memory accelerator that supports a range of matrix-vector-product (MVP)-like operations that find use in traditional and emerging applications, is proposed.

On the Resilience of Deep Learning for Reduced-voltage FPGAs

  • Kamyar GivakiBehzad Salami O. Unsal
  • Computer Science
    2020 28th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP)
  • 2020
It is found that modern FPGAs are robust enough in extremely low-voltage levels and that low- voltage related faults can be automatically masked within the training iterations, so there is no need for costly software- or hardware-oriented fault mitigation techniques like ECC.
...

References

SHOWING 1-10 OF 44 REFERENCES

XNORBIN: A 95 TOp/s/W hardware accelerator for binary convolutional neural networks

XNORBIN is presented, a flexible accelerator for binary CNNs with computation tightly coupled to memory for aggressive data reuse supporting even non-trivial network topologies with large feature map volumes.

BRein Memory: A Single-Chip Binary/Ternary Reconfigurable in-Memory Deep Neural Network Accelerator Achieving 1.4 TOPS at 0.6 W

In-memory neural network processing without any external data accesses, sustained by the symmetry and simplicity of the computation of the binary/ternaty neural network, improves the energy efficiency dramatically.

A 65nm 4Kb algorithm-dependent computing-in-memory SRAM unit-macro with 2.3ns and 55.8TOPS/W fully parallel product-sum operation for binary DNN edge processors

This work implemented a 65nm 4Kb algorithm-dependent CIM-SRAM unit-macro and in-house binary DNN structure, for cost-aware DNN AI edge processors, and resulted in the first binary-based CIM -SRAM macro with the fastest PS operation, and the highest energy-efficiency among reported CIM macros.

YodaNN: An Architecture for Ultralow Power Binary-Weight CNN Acceleration

This paper presents an accelerator optimized for binary-weight CNNs that significantly outperforms the state-of-the-art in terms of energy and area efficiency and removes the need for expensive multiplications, as well as reducing I/O bandwidth and storage.

NullHop: A Flexible Convolutional Neural Network Accelerator Based on Sparse Representations of Feature Maps

A flexible and efficient CNN accelerator architecture called NullHop that implements SOA CNNs useful for low-power and low-latency application scenarios and exploits the sparsity of neuron activations in CNNs to accelerate the computation and reduce memory requirements is proposed.

XNOR-POP: A processing-in-memory architecture for binary Convolutional Neural Networks in Wide-IO2 DRAMs

A novel process-in-memory architecture to process emerging binary CNN tests in Wide-IO2 DRAMs is presented, which improves CNN test performance by 4× ∼ 11× with small hardware and power overhead.

A Heterogeneous Multicore System on Chip for Energy Efficient Brain Inspired Computing

Mia Wallace is presented, a 65-nm system-on-chip integrating a near-threshold parallel processor cluster tightly coupled with a CNN accelerator that achieves peak energy efficiency of 108 GMAC/s/W at 0.72 V and peak performance of 14 GMac/s at 1.2 V.

FINN: A Framework for Fast, Scalable Binarized Neural Network Inference

FINN, a framework for building fast and flexible FPGA accelerators using a flexible heterogeneous streaming architecture that implements fully connected, convolutional and pooling layers, with per-layer compute resources being tailored to user-provided throughput requirements is presented.

Design Automation for Binarized Neural Networks: A Quantum Leap Opportunity?

The hardware design and synthesis of a purely combinational BNN for ultra-low power near-sensor processing, which features a 10× higher energy efficiency and major opportunities raised by BNN models.

An always-on 3.8μJ/86% CIFAR-10 mixed-signal binary CNN processor with all memory on chip in 28nm CMOS

A mixed-signal binary CNN processor that performs image classification of moderate complexity and employs near-memory computing to achieve a classification energy of 3.8μJ, a 40x improvement over TrueNorth.