# FPGA based implementation of deep neural networks using on-chip memory only

@article{Park2016FPGABI, title={FPGA based implementation of deep neural networks using on-chip memory only}, author={Jinhwan Park and Wonyong Sung}, journal={2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)}, year={2016}, pages={1011-1015} }

Deep neural networks (DNNs) demand a very large amount of computation and weight storage, and thus efficient implementation using special purpose hardware is highly desired. In this work, we have developed an FPGA based fixed-point DNN system using only on-chip memory not to access external DRAM. The execution time and energy consumption of the developed system is compared with a GPU based implementation. Since the capacity of memory in FPGA is limited, only 3-bit weights are used for this… Expand

#### Figures, Tables, and Topics from this paper

#### 49 Citations

Deep neural network accelerator based on FPGA

- Computer Science
- 2017 4th NAFOSTED Conference on Information and Computer Science
- 2017

The proposed neural network architecture employs only one single physical computing layer to perform the whole computational fabric of fully-connected feedforward deep neural networks with customizable number of layers, number of neurons per layer and number of inputs. Expand

Optimizing Neural Networks for Efficient FPGA Implementation: A Survey

- Computer Science
- 2021

This paper will cite the existing optimization techniques and evaluate them to provide a complete overview of FPGA based DNN accelerators and show how they can outperform GPGPU because of their flexible architecture and low power consumption. Expand

An FPGA-Based Hardware Accelerator for CNNs Using On-Chip Memories Only: Design and Benchmarking with Intel Movidius Neural Compute Stick

- Computer Science
- Int. J. Reconfigurable Comput.
- 2019

A full on-chip field-programmable gate array hardware accelerator for a separable convolutional neural network, which was designed for a keyword spotting application and shows that better inference time and energy per inference results can be obtained with comparable accuracy at expenses of a higher design effort and development time through the FPGA solution. Expand

A Customized Hardware Architecture for Multi-layer Artificial Neural Networks on FPGA

- Computer Science
- 2018

Experimental results show that the proposed neural network architecture is a very promising design choice for high-performance embedded recognition applications. Expand

VHDL generator for a high performance convolutional neural network FPGA-based accelerator

- Computer Science
- 2017 International Conference on ReConFigurable Computing and FPGAs (ReConFig)
- 2017

This paper proposes a tool which allows developers, through a configurable user-interface, to automatically generate VHDL code for their desired CNN model, which is modular, massively parallel, reconfigurable, scalable, fully pipelined, and adaptive to different CNN models. Expand

A FPGA-Based, Granularity-Variable Neuromorphic Processor and Its Application in a MIMO Real-Time Control System

- Computer Science, Medicine
- Sensors
- 2017

The FBGVNP provides a new scheme for building ANNs, which is flexible, highly energy-efficient, and can be applied in many areas, as well as validate the effectiveness of the neuromorphic processor. Expand

Energy Adaptive Convolution Neural Network Using Dynamic Partial Reconfiguration

- Computer Science
- 2020 IEEE 63rd International Midwest Symposium on Circuits and Systems (MWSCAS)
- 2020

This CNN is implemented on Xilinx XC7Z020 FPGA and is trained to recognize MNIST dataset and is approximated through quantization which reduces the accuracy only by 0.53% while using 7-bits for the implementation. Expand

Bandwidth Efficient Architectures for Convolutional Neural Network

- Computer Science
- 2018 IEEE International Workshop on Signal Processing Systems (SiPS)
- 2018

In recent years, Convolutional Neural Network (CNN) has been rapidly evolving and the real-time CNN implementations in embedded systems are becoming highly demanding. It is necessary that high… Expand

FPGA-Based Low-Power Speech Recognition with Recurrent Neural Networks

- Computer Science
- 2016 IEEE International Workshop on Signal Processing Systems (SiPS)
- 2016

This paper develops a neural network based real-time speech recognition (SR) system using an FPGA for very low-power operation and employs a statistical word-level LM to improve the recognition accuracy. Expand

PIE: A Pipeline Energy-Efficient Accelerator for Inference Process in Deep Neural Networks

- Computer Science
- 2016 IEEE 22nd International Conference on Parallel and Distributed Systems (ICPADS)
- 2016

Experimental results indicate that PIE is 4.82x faster than CPU and can reduce the energy consumptions of CPU and GPU by 355.35x and 12.02x respectively. Expand

#### References

SHOWING 1-10 OF 18 REFERENCES

X1000 real-time phoneme recognition VLSI using feed-forward deep neural networks

- Computer Science
- 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2014

This work develops a digital VLSI for phoneme recognition using deep neural networks and assess the design in terms of throughput, chip size, and power consumption. Expand

A fast deep learning system using GPU

- Computer Science
- 2014 IEEE International Symposium on Circuits and Systems (ISCAS)
- 2014

A variant of the deep belief network, called folded-DBN, is implemented on NVIDA's Tesla K20 GPU, which results in 7 to 11 times speedup over the CPU platform. Expand

Artificial neural networks in hardware: A survey of two decades of progress

- Computer Science
- Neurocomputing
- 2010

This article presents a comprehensive overview of the hardware realizations of artificial neural network (ANN) models, known as hardware neural networks (HNN), appearing in academic studies as… Expand

Fixed-point feedforward deep neural network design using weights +1, 0, and −1

- Computer Science
- 2014 IEEE Workshop on Signal Processing Systems (SiPS)
- 2014

The designed fixed-point networks with ternary weights (+1, 0, and -1) and 3-bit signal show only negligible performance loss when compared to the floating-point coun-terparts. Expand

Optimized Deep Learning Architectures with Fast Matrix Operation Kernels on Parallel Platform

- Computer Science
- 2013 IEEE 25th International Conference on Tools with Artificial Intelligence
- 2013

An optimized deep learning architecture with flexible layer structures and fast matrix operation kernels on parallel computing platform (e.g. NVIDIA's GPU) is introduced and saves 70% time on average comparing with the kernels in NVIDIA's CUBLAS library. Expand

Fixed point optimization of deep convolutional neural networks for object recognition

- Computer Science
- 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2015

The results indicate that quantization induces sparsity in the network which reduces the effective number of network parameters and improves generalization, and reduces the required memory storage by a factor of 1/10 and achieves better classification results than the high precision networks. Expand

Neural Network Adaptations to Hardware Implementations

- Computer Science
- 1997

In this section an overview is given of the various issues that are encountered when mapping an ideal neural network model onto a compact and reliable neural network hardware implementation, like quantization, handling nonuniformities and nonideal responses, and restraining computational complexity. Expand

Hardware accelerated convolutional neural networks for synthetic vision systems

- Computer Science
- Proceedings of 2010 IEEE International Symposium on Circuits and Systems
- 2010

This system is fully digital and is a modular vision engine with the goal of performing real-time detection, recognition and segmentation of mega-pixel images. Expand

Learning both Weights and Connections for Efficient Neural Network

- Computer Science
- NIPS
- 2015

A method to reduce the storage and computation required by neural networks by an order of magnitude without affecting their accuracy by learning only the important connections, and prunes redundant connections using a three-step method. Expand

Efficient digital implementation of the sigmoid function for reprogrammable logic

- Mathematics
- 2003

Special attention must be paid to an efficient approximation of the sigmoid function in implementing FPGA-based reprogrammable hardware-based artificial neural networks. Four previously published… Expand