FPGA based implementation of deep neural networks using on-chip memory only

@article{Park2016FPGABI,
  title={FPGA based implementation of deep neural networks using on-chip memory only},
  author={Jinhwan Park and Wonyong Sung},
  journal={2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  year={2016},
  pages={1011-1015}
}
  • Jinhwan Park, Wonyong Sung
  • Published 2016
  • Computer Science
  • 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Deep neural networks (DNNs) demand a very large amount of computation and weight storage, and thus efficient implementation using special purpose hardware is highly desired. In this work, we have developed an FPGA based fixed-point DNN system using only on-chip memory not to access external DRAM. The execution time and energy consumption of the developed system is compared with a GPU based implementation. Since the capacity of memory in FPGA is limited, only 3-bit weights are used for this… Expand
Deep neural network accelerator based on FPGA
  • T. Huynh
  • Computer Science
  • 2017 4th NAFOSTED Conference on Information and Computer Science
  • 2017
TLDR
The proposed neural network architecture employs only one single physical computing layer to perform the whole computational fabric of fully-connected feedforward deep neural networks with customizable number of layers, number of neurons per layer and number of inputs. Expand
Optimizing Neural Networks for Efficient FPGA Implementation: A Survey
TLDR
This paper will cite the existing optimization techniques and evaluate them to provide a complete overview of FPGA based DNN accelerators and show how they can outperform GPGPU because of their flexible architecture and low power consumption. Expand
An FPGA-Based Hardware Accelerator for CNNs Using On-Chip Memories Only: Design and Benchmarking with Intel Movidius Neural Compute Stick
TLDR
A full on-chip field-programmable gate array hardware accelerator for a separable convolutional neural network, which was designed for a keyword spotting application and shows that better inference time and energy per inference results can be obtained with comparable accuracy at expenses of a higher design effort and development time through the FPGA solution. Expand
A Customized Hardware Architecture for Multi-layer Artificial Neural Networks on FPGA
TLDR
Experimental results show that the proposed neural network architecture is a very promising design choice for high-performance embedded recognition applications. Expand
VHDL generator for a high performance convolutional neural network FPGA-based accelerator
TLDR
This paper proposes a tool which allows developers, through a configurable user-interface, to automatically generate VHDL code for their desired CNN model, which is modular, massively parallel, reconfigurable, scalable, fully pipelined, and adaptive to different CNN models. Expand
A FPGA-Based, Granularity-Variable Neuromorphic Processor and Its Application in a MIMO Real-Time Control System
TLDR
The FBGVNP provides a new scheme for building ANNs, which is flexible, highly energy-efficient, and can be applied in many areas, as well as validate the effectiveness of the neuromorphic processor. Expand
Energy Adaptive Convolution Neural Network Using Dynamic Partial Reconfiguration
TLDR
This CNN is implemented on Xilinx XC7Z020 FPGA and is trained to recognize MNIST dataset and is approximated through quantization which reduces the accuracy only by 0.53% while using 7-bits for the implementation. Expand
Bandwidth Efficient Architectures for Convolutional Neural Network
In recent years, Convolutional Neural Network (CNN) has been rapidly evolving and the real-time CNN implementations in embedded systems are becoming highly demanding. It is necessary that highExpand
FPGA-Based Low-Power Speech Recognition with Recurrent Neural Networks
TLDR
This paper develops a neural network based real-time speech recognition (SR) system using an FPGA for very low-power operation and employs a statistical word-level LM to improve the recognition accuracy. Expand
PIE: A Pipeline Energy-Efficient Accelerator for Inference Process in Deep Neural Networks
TLDR
Experimental results indicate that PIE is 4.82x faster than CPU and can reduce the energy consumptions of CPU and GPU by 355.35x and 12.02x respectively. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 18 REFERENCES
X1000 real-time phoneme recognition VLSI using feed-forward deep neural networks
TLDR
This work develops a digital VLSI for phoneme recognition using deep neural networks and assess the design in terms of throughput, chip size, and power consumption. Expand
A fast deep learning system using GPU
TLDR
A variant of the deep belief network, called folded-DBN, is implemented on NVIDA's Tesla K20 GPU, which results in 7 to 11 times speedup over the CPU platform. Expand
Artificial neural networks in hardware: A survey of two decades of progress
This article presents a comprehensive overview of the hardware realizations of artificial neural network (ANN) models, known as hardware neural networks (HNN), appearing in academic studies asExpand
Fixed-point feedforward deep neural network design using weights +1, 0, and −1
TLDR
The designed fixed-point networks with ternary weights (+1, 0, and -1) and 3-bit signal show only negligible performance loss when compared to the floating-point coun-terparts. Expand
Optimized Deep Learning Architectures with Fast Matrix Operation Kernels on Parallel Platform
  • Y. Zhang, Saizheng Zhang
  • Computer Science
  • 2013 IEEE 25th International Conference on Tools with Artificial Intelligence
  • 2013
TLDR
An optimized deep learning architecture with flexible layer structures and fast matrix operation kernels on parallel computing platform (e.g. NVIDIA's GPU) is introduced and saves 70% time on average comparing with the kernels in NVIDIA's CUBLAS library. Expand
Fixed point optimization of deep convolutional neural networks for object recognition
TLDR
The results indicate that quantization induces sparsity in the network which reduces the effective number of network parameters and improves generalization, and reduces the required memory storage by a factor of 1/10 and achieves better classification results than the high precision networks. Expand
Neural Network Adaptations to Hardware Implementations
TLDR
In this section an overview is given of the various issues that are encountered when mapping an ideal neural network model onto a compact and reliable neural network hardware implementation, like quantization, handling nonuniformities and nonideal responses, and restraining computational complexity. Expand
Hardware accelerated convolutional neural networks for synthetic vision systems
TLDR
This system is fully digital and is a modular vision engine with the goal of performing real-time detection, recognition and segmentation of mega-pixel images. Expand
Learning both Weights and Connections for Efficient Neural Network
TLDR
A method to reduce the storage and computation required by neural networks by an order of magnitude without affecting their accuracy by learning only the important connections, and prunes redundant connections using a three-step method. Expand
Efficient digital implementation of the sigmoid function for reprogrammable logic
Special attention must be paid to an efficient approximation of the sigmoid function in implementing FPGA-based reprogrammable hardware-based artificial neural networks. Four previously publishedExpand
...
1
2
...