Residue Number System-Based Solution for Reducing the Hardware Cost of a Convolutional Neural Network
@article{Chervyakov2020ResidueNS, title={Residue Number System-Based Solution for Reducing the Hardware Cost of a Convolutional Neural Network}, author={Nikolai I. Chervyakov and Pavel A. Lyakhov and Maxim Anatolievich Deryabin and Nikolai Nagornov and Maria V. Valueva and Georgii V. Valuev}, journal={Neurocomputing}, year={2020}, volume={407}, pages={439-453} }
3 Citations
A generic FPGA-based hardware architecture for recursive least mean p-power extreme learning machine
- Computer ScienceNeurocomputing
- 2021
The Process of Training a General-Purpose Audio Classification Model
- Computer ScienceProceedings of the International Scientific Conference - Sinteza 2022
- 2022
This paper describes the basic steps of training a general audio classification model which can predict a limited number of distinct sounds, and it outlines the techniques that are employed during the process of training any sound classification model, regardless of its intended usage.
Quasi-Chaotic Oscillators Based on Modular Quantum Circuits
- Computer Science, Physics
- 2022
The implementation of a quasi-chaotic oscillator based on quantum modular addition and multiplication is proposed and it is proved that quantum computing allows the parallel processing of data, paving the way for a fast and robust multi-channel encryption/decryption scheme.
References
SHOWING 1-10 OF 39 REFERENCES
A deep convolutional neural network based on nested residue number system
- Computer Science2015 25th International Conference on Field Programmable Logic and Applications (FPL)
- 2015
The nested RNS (NRNS), which recursively decomposes the RNS, can decompose the MAC unit into circuits with small sizes and lead to a balanced usage of FPGA resources leads to a high clock frequency with less hardware.
Efficient Hardware Architectures for Deep Convolutional Neural Network
- Computer ScienceIEEE Transactions on Circuits and Systems I: Regular Papers
- 2018
The theoretical derivation of parallel fast finite impulse response algorithm (FFA) is introduced and the corresponding fast convolution units (FCUs) are developed for the computation of convolutions in the CNN models.
A High-speed Low-power Deep Neural Network on an FPGA based on the Nested RNS: Applied to an Object Detector
- Computer Science2018 IEEE International Symposium on Circuits and Systems (ISCAS)
- 2018
The nested RNS (NRNS), which recursively decomposes the RNS, is used, which can decompose the MAC unit into circuits with small sizes and leads to a high clock frequency with less hardware.
Efficient convolution architectures for convolutional neural network
- Computer Science2016 8th International Conference on Wireless Communications & Signal Processing (WCSP)
- 2016
An efficient top-level architecture for processing a complete convolutional layer in a CNN is developed and the design of an FCU is coded with RTL and synthesized with TSMC 90nm CMOS technology.
RNSnet: In-Memory Neural Network Acceleration Using Residue Number System
- Computer Science2018 IEEE International Conference on Rebooting Computing (ICRC)
- 2018
RNSnet simplifies the fundamental neural network operations and maps them to in-memory addition and data access and can achieve 8.5 x higher energy-delay product as compared to the state-of-the-art neural network accelerators.
Data and Hardware Efficient Design for Convolutional Neural Network
- Computer ScienceIEEE Transactions on Circuits and Systems I: Regular Papers
- 2018
An end-to-end CNN accelerator that maximizes hardware utilization with run-time configurations of different kernel sizes and minimizes data bandwidth with the output first strategy to improve the data reuse of the convolutional layers by up to up to $300\times \sim 600\times $ compared with the non-reused case is presented.
MALOC: A Fully Pipelined FPGA Accelerator for Convolutional Neural Networks With All Layers Mapped on Chip
- Computer ScienceIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
- 2018
A new architecture for FPGA-based CNN accelerator that maps all the layers to their own on-chip units and working concurrently as a pipeline is proposed, which can achieve maximum resource utilization as well as optimal computational efficiency.
Increasing of convolutional neural network performance using residue number system
- Computer Science2017 International Multi-Conference on Engineering, Computer and Information Sciences (SIBIRCON)
- 2017
The architecture of the convolutional neural network constructed with residue number system for delay minimization is presented and using of special type of modules allows to accelerate the work of the device by 37,4% as compared to using a binary number system and by 18,5% by using a known residues number system realization.
ImageNet classification with deep convolutional neural networks
- Computer ScienceCommun. ACM
- 2012
A large, deep convolutional neural network was trained to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes and employed a recently developed regularization method called "dropout" that proved to be very effective.
Small sample image recognition using improved Convolutional Neural Network
- Computer ScienceJ. Vis. Commun. Image Represent.
- 2018