Application of the residue number system to reduce hardware costs of the convolutional neural network implementation

@article{Valueva2020ApplicationOT,
  title={Application of the residue number system to reduce hardware costs of the convolutional neural network implementation},
  author={Maria V. Valueva and Nikolai Nagornov and Pavel A. Lyakhov and Georgii V. Valuev and Nikolai I. Chervyakov},
  journal={Math. Comput. Simul.},
  year={2020},
  volume={177},
  pages={232-243}
}

Figures and Tables from this paper

AN OPTIMIZED CONVOLUTIONAL NEURAL NETWORK FOR HANDWRITTEN DIGITAL RECOGNITION CLASSIFICATION
TLDR
A prediction model for handwritten classification problem by using CNN and a near state of the performance on MINST handwritten digit recognition task in Python using Keras deep learning library was developed and discovered.
Formal Verification of Deep Neural Networks in Hardware
TLDR
The light weight formal verification model is created by applying neural network simplification approaches discussed in the paper and is capable of verifying intricate deep neural network designs effectively as compared with simulation based conventional strategies.
Enhancing Detection Performance of Face Recognition Algorithm Using PCA-Faster R-CNN
TLDR
The proposed face recognition system using Principal Component Analysis and Faster R Convolutional Neural Network achieves a higher recall and accuracy ratio than the Faster RCNN without PCA method, and is more efficient when compared to the latest researches done in the area of facial recognition.
International Journal of Electrical and Computer Engineering (IJECE)
TLDR
Comparative results with the state-of-the-art techniques on the same data show the competing performance of the proposed framework for simultaneous liver and tumor segmentation.
Optimization of a Convolutional Neural Network for the Automated Diagnosis of Melanoma
TLDR
The aim in this study was the optimization of convolutional neural network algorithms for the automated diagnosis of melanoma, and it was hypothesized that Optimal selection of the momentum and batch hyperparameter increases model accuracy.
ReFACE: Efficient Design Methodology for Acceleration of Digital Filter Implementations
TLDR
A systematic methodology to efficiently implement computing in-memory (CIM) accelerators for FIR filters using various CMOS and post-CMOS technologies, referred to as ReFACE, which leverages a residue number system (RNS) to speed up the essential operations of digital filters.
Traffic Sign Classification Using Convolutional Neural Network
TLDR
Convolutional Neural Network model is used to Classify Traffic Sign and is able to outperform previous models and resulted with accuracy of 99.6% on validation set.
BengaliNet: A Low-Cost Novel Convolutional Neural Network for Bengali Handwritten Characters Recognition
TLDR
This research proposed a low-cost novel convolutional neural network architecture for the recognition of Bengali characters with only 2.24 to 2.43 million parameters based on the number of output classes to help develop an automated high-performance recognition tool for Bengali handwritten characters.
Enhanced convolutional neural network based android software maintainability prediction
TLDR
The simulation results of maintainability prediction in android application development using proposed enhanced CNN achieves better result while comparing with other standard classification models.
...
...

References

SHOWING 1-10 OF 35 REFERENCES
Increasing of convolutional neural network performance using residue number system
TLDR
The architecture of the convolutional neural network constructed with residue number system for delay minimization is presented and using of special type of modules allows to accelerate the work of the device by 37,4% as compared to using a binary number system and by 18,5% by using a known residues number system realization.
A deep convolutional neural network based on nested residue number system
TLDR
The nested RNS (NRNS), which recursively decomposes the RNS, can decompose the MAC unit into circuits with small sizes and lead to a balanced usage of FPGA resources leads to a high clock frequency with less hardware.
Efficient Hardware Architectures for Deep Convolutional Neural Network
TLDR
The theoretical derivation of parallel fast finite impulse response algorithm (FFA) is introduced and the corresponding fast convolution units (FCUs) are developed for the computation of convolutions in the CNN models.
Data and Hardware Efficient Design for Convolutional Neural Network
TLDR
An end-to-end CNN accelerator that maximizes hardware utilization with run-time configurations of different kernel sizes and minimizes data bandwidth with the output first strategy to improve the data reuse of the convolutional layers by up to up to $300\times \sim 600\times $ compared with the non-reused case is presented.
Efficient hardware architecture of softmax layer in deep neural network
  • Bo Yuan
  • Computer Science
    2016 29th IEEE International System-on-Chip Conference (SOCC)
  • 2016
TLDR
For the first time, this paper presents efficient hardware architecture of softmax layer in DNN by utilizing the domain transformation technique and down-scaling approach and shows that the proposed hardware architecture achieves reduced hardware complexity and critical path delay.
A High-speed Low-power Deep Neural Network on an FPGA based on the Nested RNS: Applied to an Object Detector
TLDR
The nested RNS (NRNS), which recursively decomposes the RNS, is used, which can decompose the MAC unit into circuits with small sizes and leads to a high clock frequency with less hardware.
High performance training of deep neural networks using pipelined hardware acceleration and distributed memory
TLDR
This paper presents a scalable pipelined hardware architecture with distributed memories for a digital neuron to implement deep neural networks and explores various functions and algorithms as well as different memory topologies, to optimize the performance of the training architecture.
FPGA implementation of a real-time super-resolution system with a CNN based on a residue number system
TLDR
An FPGA implementation and a performance evaluation of a CNN-based super-resolution system, which can process moving images in real time and has the superior quality in terms of the peak signal-to-noise ratio (PSNR), compared to other systems using pre-enlargement.
Small sample image recognition using improved Convolutional Neural Network
FPGA-Based Accelerators of Deep Learning Networks for Learning and Classification: A Review
TLDR
The techniques investigated in this paper represent the recent trends in the FPGA-based accelerators of deep learning networks and are expected to direct the future advances on efficient hardware accelerators and to be useful for deep learning researchers.
...
...