Corpus ID: 56169108

Digital Neuron: A Hardware Inference Accelerator for Convolutional Deep Neural Networks

  title={Digital Neuron: A Hardware Inference Accelerator for Convolutional Deep Neural Networks},
  author={Hyunbin Park and Dohyun Kim and Shiho Kim},
We propose a Digital Neuron, a hardware inference accelerator for convolutional deep neural networks with integer inputs and integer weights for embedded systems. The main idea to reduce circuit area and power consumption is manipulating dot products between input feature and weight vectors by Barrel shifters and parallel adders. The reduced area allows the more computational engines to be mounted on an inference accelerator, resulting in high throughput compared to prior HW accelerators. We… Expand
A Depthwise Separable Convolution Architecture for CNN Accelerator
A pipelined architecture of Depthwise Separable Convolution followed by activation and pooling operations for a single layer of CNN is discussed, implemented on Xilinx 7 series FPGA and works at a clock period of 40ns. Expand
Bit-Serial multiplier based Neural Processing Element with Approximate adder tree
This study designed a neural processing element with approximate adders that reduces resource utilization without changing the accuracy of deep learning algorithms by using the fault tolerance property ofdeep learning algorithms. Expand


DSIP: A Scalable Inference Accelerator for Convolutional Neural Networks
A scalable inference accelerator called a deep-learning specific instruction-set processor (DSIP) to support various convolutional neural networks (CNNs) and enhances the energy efficiency by 2.17 $\times $ . Expand
Ristretto: Hardware-Oriented Approximation of Convolutional Neural Networks
Ristretto is a fast and automated framework for CNN approximation which simulates the hardware arithmetic of a custom hardware accelerator, and can successfully condense CaffeNet and SqueezeNet to 8-bit. Expand
COSY: An Energy-Efficient Hardware Architecture for Deep Convolutional Neural Networks Based on Systolic Array
  • Chen Xin, Qiang Chen, +4 authors Bo Wang
  • Computer Science
  • 2017 IEEE 23rd International Conference on Parallel and Distributed Systems (ICPADS)
  • 2017
COSY (CNN on Systolic Array), an energy-efficient hardware architecture based on the systolic array for CNNs, which can achieve an over 15% reduction in energy consumption under the same constraints, and it is proved that COSY has the intrinsic ability for zero-skipping. Expand
Memory-centric accelerator design for Convolutional Neural Networks
It is shown that the effects of the memory bottleneck can be reduced by a flexible memory hierarchy that supports the complex data access patterns in CNN workload and ensures that on-chip memory size is minimized, which reduces area and energy usage. Expand
Bit Fusion: Bit-Level Dynamically Composable Architecture for Accelerating Deep Neural Network
This work designs Bit Fusion, a bit-flexible accelerator that constitutes an array of bit-level processing elements that dynamically fuse to match the bitwidth of individual DNN layers, and compares it to two state-of-the-art DNN accelerators, Eyeriss and Stripes. Expand
Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks
This work implements a CNN accelerator on a VC707 FPGA board and compares it to previous approaches, achieving a peak performance of 61.62 GFLOPS under 100MHz working frequency, which outperform previous approaches significantly. Expand
UNPU: An Energy-Efficient Deep Neural Network Accelerator With Fully Variable Weight Bit Precision
An energy-efficient deep neural network (DNN) accelerator, unified neural processing unit (UNPU), is proposed for mobile deep learning applications and is the first DNN accelerator ASIC that can support fully variable weight bit precision from 1 to 16 bit. Expand
Deep Convolutional Neural Network Inference with Floating-point Weights and Fixed-point Activations
It is shown that using floating-point numbers for weights is more efficient than fixed-point representation for the same bit-width and enables compact hardware multiply-and-accumulate (MAC) unit design. Expand
FPGA-based CNN inference accelerator synthesized from multi-threaded C software
A deep-learning inference accelerator is synthesized from a C-language software program parallelized with Pthreads, where convolution, pooling and padding are realized in the synthesized accelerator, with remaining tasks executing on an embedded ARM processor. Expand
BinaryConnect: Training Deep Neural Networks with binary weights during propagations
BinaryConnect is introduced, a method which consists in training a DNN with binary weights during the forward and backward propagations, while retaining precision of the stored weights in which gradients are accumulated, and near state-of-the-art results with BinaryConnect are obtained on the permutation-invariant MNIST, CIFAR-10 and SVHN. Expand