XNORBIN: A 95 TOp/s/W hardware accelerator for binary convolutional neural networks

@article{Bahou2018XNORBINA9,
  title={XNORBIN: A 95 TOp/s/W hardware accelerator for binary convolutional neural networks},
  author={Andrawes Al Bahou and G. Karunaratne and Renzo Andri and L. Cavigelli and L. Benini},
  journal={2018 IEEE Symposium in Low-Power and High-Speed Chips (COOL CHIPS)},
  year={2018},
  pages={1-3}
}
Deploying state-of-the-art CNNs requires power-hungry processors and off-chip memory. This precludes the implementation of CNNs in low-power embedded systems. Recent research shows CNNs sustain extreme quantization, binarizing their weights and intermediate feature maps, thereby saving 8–32x memory and collapsing energy-intensive sum-of-products into XNOR-and-popcount operations. We present XNORBIN, a flexible accelerator for binary CNNs with computation tightly coupled to memory for aggressive… Expand
Hyperdrive: A Multi-Chip Systolically Scalable Binary-Weight CNN Inference Engine
TLDR
Hyperdrive is presented: a BWN accelerator dramatically reducing the I/O bandwidth exploiting a novel binary-weight streaming approach, which can be used for an arbitrarily sized convolutional neural network architecture and input resolution by exploiting the natural scalability of the compute units both at chip-level and system-level. Expand
Towards energy-efficient convolutional neural network inference
TLDR
This thesis first evaluates the capabilities of off-the-shelf software-programmable hardware before diving into specialized hardware accelerators and exploring the potential of extremely quantized CNNs, and gives special consideration to external memory bandwidth. Expand
XNOR Neural Engine: A Hardware Accelerator IP for 21.6-fJ/op Binary Neural Network Inference
TLDR
The XNOR neural engine (XNE), a fully digital configurable hardware accelerator IP for BNNs, integrated within a microcontroller unit (MCU) equipped with an autonomous I/O subsystem and hybrid SRAM/standard cell memory, is introduced. Expand
A PVT-robust Customized 4T Embedded DRAM Cell Array for Accelerating Binary Neural Networks
TLDR
This work proposes a PVT-robust accelerator architecture for BNN with a computable 4T embedded DRAM (eDRAM) cell array and implements the XNOR operation of BNN in a time-multiplexed manner by utilizing the fundamental read operation of the conventional eDRAM cell. Expand
Hyperdrive: A Systolically Scalable Binary-Weight CNN Inference Engine for mW IoT End-Nodes
TLDR
Hyperdrive is presented: a BWN accelerator dramatically reducing the I/O bandwidth exploiting a novel binary-weight streaming approach, and capable of handling high-resolution images by virtue of its systolic-scalable architecture. Expand
Model Compression and Hardware Acceleration for Neural Networks: A Comprehensive Survey
TLDR
This article reviews the mainstream compression approaches such as compact model, tensor decomposition, data quantization, and network sparsification, and answers the question of how to leverage these methods in the design of neural network accelerators and present the state-of-the-art hardware architectures. Expand
InS-DLA: An In-SSD Deep Learning Accelerator for Near-Data Processing
TLDR
This work designed an energy-efficient In-SSD Deep Learning Accelerator, InS-DLA, for Near-Data-Processing that directly operates on NAND Flash inside the Open Channel Solid-State-Drive where the target data are stored, eliminating the power and performance overhead caused by data movement. Expand
MB-CNN: Memristive Binary Convolutional Neural Networks for Embedded Mobile Devices
TLDR
MB-CNN, a memristive accelerator for binary convolutional neural networks that perform XNOR convolution in-situ novel 2R Memristive data blocks to improve power, performance, and memory requirements of embedded mobile devices is proposed. Expand
Ternary Compute-Enabled Memory using Ferroelectric Transistors for Accelerating Deep Neural Networks
TLDR
A Non-Volatile Ternary Compute-Enabled memory cell (TeC-Cell) based on ferroelectric transistors (FEFETs) for inmemory computing in the signed ternary regime and shows 3.3X-3.4X reduction in system energy and 4X-7X improvement in system performance over SRAM and FEFET based nearmemory accelerators, across a wide range of DNN benchmarks. Expand
Towards Fast and Energy-Efficient Binarized Neural Network Inference on FPGA
TLDR
Two types of fast and energy-efficient architectures for BNN inference are proposed and analysis and insights are provided to pick the better strategy of these two for different datasets and network models. Expand
...
1
2
3
...

References

SHOWING 1-10 OF 14 REFERENCES
YodaNN: An Architecture for Ultralow Power Binary-Weight CNN Acceleration
TLDR
This paper presents an accelerator optimized for binary-weight CNNs that significantly outperforms the state-of-the-art in terms of energy and area efficiency and removes the need for expensive multiplications, as well as reducing I/O bandwidth and storage. Expand
FINN: A Framework for Fast, Scalable Binarized Neural Network Inference
TLDR
FINN, a framework for building fast and flexible FPGA accelerators using a flexible heterogeneous streaming architecture that implements fully connected, convolutional and pooling layers, with per-layer compute resources being tailored to user-provided throughput requirements is presented. Expand
An always-on 3.8μJ/86% CIFAR-10 mixed-signal binary CNN processor with all memory on chip in 28nm CMOS
TLDR
A mixed-signal binary CNN processor that performs image classification of moderate complexity and employs near-memory computing to achieve a classification energy of 3.8μJ, a 40x improvement over TrueNorth. Expand
Fully parallel RRAM synaptic array for implementing binary neural network with (+1, −1) weights and (+1, 0) neurons
TLDR
This work analyzes a fully parallel RRAM synaptic array architecture that implements the fully connected layers in a convolutional neural network with (+1, −1) weights and (-1, 0) neurons and proposes the proposed fully parallel BNN architecture (P-BNN), which can achieve 137.35 TOPS/W energy efficiency for the inference. Expand
Towards Accurate Binary Convolutional Neural Network
TLDR
The implementation of the resulting binary CNN, denoted as ABC-Net, is shown to achieve much closer performance to its full-precision counterpart, and even reach the comparable prediction accuracy on ImageNet and forest trail datasets, given adequate binary weight bases and activations. Expand
XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks
TLDR
The Binary-Weight-Network version of AlexNet is compared with recent network binarization methods, BinaryConnect and BinaryNets, and outperform these methods by large margins on ImageNet, more than \(16\,\%\) in top-1 accuracy. Expand
An Analysis of Deep Neural Network Models for Practical Applications
TLDR
This work presents a comprehensive analysis of important metrics in practical applications: accuracy, memory footprint, parameters, operations count, inference time and power consumption and believes it provides a compelling set of information that helps design and engineer efficient DNNs. Expand
What are “ A ” and “ B ” ?
Direct fabrication of large micropatterned single crystals. p1205 21 Feb 2003. (news): Academy plucks best biophysicists from a sea of mediocrity. p994 14 Feb 2003.
and A
  • Farhadi, “Xnor-net: Imagenet classification using binary convolutional neural networks,” in Proc. European Conference on Computer Vision
  • 2016
R
  • Liu, J.-s. Seo, and S. Yu, “Fully parallel rram synaptic array for implementing binary neural network with (+1,-1) weights and (+1,0) neurons,” in Proc. IEEE Asia and South Pacific Design Automation Conference (ASP-DAC)
  • 2018
...
1
2
...