Recent advances in efficient computation of deep convolutional neural networks

@article{Cheng2018RecentAI,
  title={Recent advances in efficient computation of deep convolutional neural networks},
  author={Jian Cheng and Peisong Wang and Gang Li and Qinghao Hu and Hanqing Lu},
  journal={Frontiers of Information Technology \& Electronic Engineering},
  year={2018},
  volume={19},
  pages={64-77}
}
Deep neural networks have evolved remarkably over the past few years and they are currently the fundamental tools of many intelligent systems. At the same time, the computational complexity and resource consumption of these networks continue to increase. This poses a significant challenge to the deployment of such networks, especially in real-time applications or on resource-limited devices. Thus, network acceleration has become a hot topic within the deep learning community. As for hardware… Expand
Learning on Hardware: A Tutorial on Neural Network Accelerators and Co-Processors
TLDR
An overview of existing neural network hardware accelerators and acceleration methods is given and a recommendation of suitable applications is given, which focuses on acceleration of the inference of convolutional neural networks used for image recognition tasks. Expand
Deep Neural Network Approximation for Custom Hardware
TLDR
This article provides a comprehensive evaluation of approximation methods for high-performance network inference along with in-depth discussion of their effectiveness for custom hardware implementation and includes proposals for future research based on a thorough analysis of current trends. Expand
Approximate Multiply-Accumulate Array for Convolutional Neural Networks on FPGA
TLDR
An approximate high-speed implementation of the convolution stage of a CNN computing architecture, the Approximate Multiply-Accumulate Array, which converts multiplications into additions and systolic accumulate operations and allows to trade off power, area and speed with accuracy through for specific data using different configurations. Expand
Accelerating CNN Inference on ASICs: A Survey
TLDR
A novel taxonomy to classify prior work is proposed, and some of the key contributions in these areas in detail are described in detail. Expand
Fine-grained Scheduling in FPGA-Based Convolutional Neural Networks
  • Wei Zhang, X. Liao, Hai Jin
  • Computer Science
  • 2020 IEEE 5th International Conference on Cloud Computing and Big Data Analytics (ICCCBDA)
  • 2020
TLDR
This paper proposes FConv, in which the CPU and FPGA work together in a fine-grained manner, and proposes the analytical model for prediction and uses it as a guide in task scheduling. Expand
Deep Model Compression and Architecture Optimization for Embedded Systems: A Survey
TLDR
A survey of methods suitable for porting deep neural networks on resource-limited devices, especially for smart cameras, and introduces the methods to enhance networks structures as well as neural architecture search techniques. Expand
A novel channel pruning method for deep neural network compression
TLDR
A novel channel pruning method based on genetic algorithm is proposed to compress very deep Convolution Neural Networks (CNNs) and outperforms several state-of-the-art methods. Expand
CompactNet: Platform-Aware Automatic Optimization for Convolutional Neural Networks
TLDR
This work proposes a solution, called CompactNet, which automatically optimizes a pre-trained CNN model on a specific resource-limited platform given a specific target of inference speedup, and generates an optimal platform-specific model while maintaining the accuracy. Expand
A novel cognitive Wallace compressor based multi operand adders in CNN architecture for FPGA
  • T. Kowsalya
  • Computer Science
  • J. Ambient Intell. Humaniz. Comput.
  • 2021
TLDR
The new cognitive Wallace compressor adder structures which is used for optimization in the adder layers of the convolutional neural networks (CNN) are proposed and replaces the traditional binary tree adders for the CNN accelerator design. Expand
A Solution to Optimize Multi-Operand Adders in CNN Architecture on FPGA
TLDR
Optimization strategy based on WALLACE tree architecture is proposed in this article to replace the typical binary adder tree in CNN accelerator design and Experimental results show an improvement in terms of area optimization and performance in comparison with the previous implementation. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 123 REFERENCES
Going Deeper with Embedded FPGA Platform for Convolutional Neural Network
TLDR
This paper presents an in-depth analysis of state-of-the-art CNN models and shows that Convolutional layers are computational-centric and Fully-Connected layers are memory-centric, and proposes a CNN accelerator design on embedded FPGA for Image-Net large-scale image classification. Expand
Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks
TLDR
This work implements a CNN accelerator on a VC707 FPGA board and compares it to previous approaches, achieving a peak performance of 61.62 GFLOPS under 100MHz working frequency, which outperform previous approaches significantly. Expand
SCALEDEEP: A scalable compute architecture for learning and evaluating deep networks
TLDR
SCALEDEEP is a dense, scalable server architecture, whose processing, memory and interconnect subsystems are specialized to leverage the compute and communication characteristics of DNNs, and primarily targets DNN training, as opposed to only inference or evaluation. Expand
SC-DCNN: Highly-Scalable Deep Convolutional Neural Network using Stochastic Computing
TLDR
SC-DCNN is presented, the first comprehensive design and optimization framework of SC-based DCNNs, using a bottom-up approach, and is holistically optimized to minimize area and power (energy) consumption while maintaining high network accuracy. Expand
Cambricon-X: An accelerator for sparse neural networks
  • S. Zhang, Zidong Du, +6 authors Yunji Chen
  • Computer Science
  • 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)
  • 2016
TLDR
A novel accelerator is proposed, Cambricon-X, to exploit the sparsity and irregularity of NN models for increased efficiency and experimental results show that this accelerator achieves, on average, 7.23x speedup and 6.43x energy saving against the state-of-the-art NN accelerator. Expand
From high-level deep neural models to FPGAs
TLDR
DnnWeaver is devised, a framework that automatically generates a synthesizable accelerator for a given DNN, FPGA pair from a high-level specification in Caffe that best matches the needs of the DNN while providing high performance and efficiency gains for the target FPGAs. Expand
BinaryConnect: Training Deep Neural Networks with binary weights during propagations
TLDR
BinaryConnect is introduced, a method which consists in training a DNN with binary weights during the forward and backward propagations, while retaining precision of the stored weights in which gradients are accumulated, and near state-of-the-art results with BinaryConnect are obtained on the permutation-invariant MNIST, CIFAR-10 and SVHN. Expand
Accelerating Binarized Convolutional Neural Networks with Software-Programmable FPGAs
TLDR
The design of a BNN accelerator is presented that is synthesized from C++ to FPGA-targeted Verilog and outperforms existing FPGAs-based CNN accelerators in GOPS as well as energy and resource efficiency. Expand
Throughput-Optimized OpenCL-based FPGA Accelerator for Large-Scale Convolutional Neural Networks
TLDR
This work presents a systematic design space exploration methodology to maximize the throughput of an OpenCL-based FPGA accelerator for a given CNN model, considering the FPGAs resource constraints such as on-chip memory, registers, computational resources and external memory bandwidth. Expand
Hardware-software codesign of accurate, multiplier-free Deep Neural Networks
TLDR
This work proposes a novel approach to map floating-point based DNNs to 8-bit dynamic fixed-point networks with integer power-of-two weights with no change in network architecture and proposes a hardware accelerator design to achieve low-power, low-latency inference with insignificant degradation in accuracy. Expand
...
1
2
3
4
5
...