Residue Number System-Based Solution for Reducing the Hardware Cost of a Convolutional Neural Network

@article{Chervyakov2020ResidueNS,
  title={Residue Number System-Based Solution for Reducing the Hardware Cost of a Convolutional Neural Network},
  author={Nikolai I. Chervyakov and Pavel A. Lyakhov and Maxim Anatolievich Deryabin and Nikolai Nagornov and Maria V. Valueva and Georgii V. Valuev},
  journal={Neurocomputing},
  year={2020},
  volume={407},
  pages={439-453}
}
The Process of Training a General-Purpose Audio Classification Model
TLDR
This paper describes the basic steps of training a general audio classification model which can predict a limited number of distinct sounds, and it outlines the techniques that are employed during the process of training any sound classification model, regardless of its intended usage.
Quasi-Chaotic Oscillators Based on Modular Quantum Circuits
TLDR
The implementation of a quasi-chaotic oscillator based on quantum modular addition and multiplication is proposed and it is proved that quantum computing allows the parallel processing of data, paving the way for a fast and robust multi-channel encryption/decryption scheme.

References

SHOWING 1-10 OF 39 REFERENCES
A deep convolutional neural network based on nested residue number system
TLDR
The nested RNS (NRNS), which recursively decomposes the RNS, can decompose the MAC unit into circuits with small sizes and lead to a balanced usage of FPGA resources leads to a high clock frequency with less hardware.
Efficient Hardware Architectures for Deep Convolutional Neural Network
TLDR
The theoretical derivation of parallel fast finite impulse response algorithm (FFA) is introduced and the corresponding fast convolution units (FCUs) are developed for the computation of convolutions in the CNN models.
A High-speed Low-power Deep Neural Network on an FPGA based on the Nested RNS: Applied to an Object Detector
TLDR
The nested RNS (NRNS), which recursively decomposes the RNS, is used, which can decompose the MAC unit into circuits with small sizes and leads to a high clock frequency with less hardware.
Efficient convolution architectures for convolutional neural network
TLDR
An efficient top-level architecture for processing a complete convolutional layer in a CNN is developed and the design of an FCU is coded with RTL and synthesized with TSMC 90nm CMOS technology.
RNSnet: In-Memory Neural Network Acceleration Using Residue Number System
TLDR
RNSnet simplifies the fundamental neural network operations and maps them to in-memory addition and data access and can achieve 8.5 x higher energy-delay product as compared to the state-of-the-art neural network accelerators.
Data and Hardware Efficient Design for Convolutional Neural Network
TLDR
An end-to-end CNN accelerator that maximizes hardware utilization with run-time configurations of different kernel sizes and minimizes data bandwidth with the output first strategy to improve the data reuse of the convolutional layers by up to up to $300\times \sim 600\times $ compared with the non-reused case is presented.
MALOC: A Fully Pipelined FPGA Accelerator for Convolutional Neural Networks With All Layers Mapped on Chip
TLDR
A new architecture for FPGA-based CNN accelerator that maps all the layers to their own on-chip units and working concurrently as a pipeline is proposed, which can achieve maximum resource utilization as well as optimal computational efficiency.
Increasing of convolutional neural network performance using residue number system
TLDR
The architecture of the convolutional neural network constructed with residue number system for delay minimization is presented and using of special type of modules allows to accelerate the work of the device by 37,4% as compared to using a binary number system and by 18,5% by using a known residues number system realization.
ImageNet classification with deep convolutional neural networks
TLDR
A large, deep convolutional neural network was trained to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes and employed a recently developed regularization method called "dropout" that proved to be very effective.
...
...