• Publications
  • Influence
CirCNN: Accelerating and Compressing Deep Neural Networks Using Block-Circulant Weight Matrices
The CirCNN architecture is proposed, a universal DNN inference engine that can be implemented in various hardware/software platforms with configurable network architecture (e.g., layer type, size, scales, etc) and FFT can be used as the key computing kernel which ensures universal and small-footprint implementations. Expand
C-LSTM: Enabling Efficient LSTM using Structured Compression Techniques on FPGAs
This work proposes a comprehensive framework called C-LSTM to automatically optimize and implement a wide range of LSTM variants on FPGAs and achieves up to 18.8X and 33.5X gains for performance and energy efficiency compared with the state-of-the-art L STM implementation under the same experimental setup. Expand
SC-DCNN: Highly-Scalable Deep Convolutional Neural Network using Stochastic Computing
SC-DCNN is presented, the first comprehensive design and optimization framework of SC-based DCNNs, using a bottom-up approach, and is holistically optimized to minimize area and power (energy) consumption while maintaining high network accuracy. Expand
VIBNN: Hardware Acceleration of Bayesian Neural Networks
This paper proposes VIBNN, an FPGA-based hardware accelerator design for variational inference on BNNs, and introduces two high performance Gaussian (pseudo) random number generators: the RAM-based Linear Feedback Gaussian Random Number Generator (RLF-GRNG), which is inspired by the properties of binomial distribution and linear feedback logics, and the Bayesian Neural Network-oriented Wallace Gaussian random number generator. Expand
E-RNN: Design Optimization for Efficient Recurrent Neural Networks in FPGAs
The Efficient RNN (E-RNN) framework is presented, and the alternating direction method of multipliers (ADMM) technique is used for more accurate block-circulant training, and two design explorations providing guidance on block size and reducing RNN training trials are presented. Expand
HEIF: Highly Efficient Stochastic Computing-Based Inference Framework for Deep Neural Networks
  • Zhe Li, Ji Li, +8 authors Yanzhi Wang
  • Computer Science
  • IEEE Transactions on Computer-Aided Design of…
  • 1 August 2019
HEIF is presented, a highly efficient SC-based inference framework of the large-scale DCNNs, with broad applications including (but not limited to) LeNet-5 and AlexNet, that achieves high energy efficiency and low area/hardware cost. Expand
REQ-YOLO: A Resource-Aware, Efficient Quantization Framework for Object Detection on FPGAs
Experimental results show that the proposed REQ-YOLO framework can signi cantly compress the YOLO model while introducing very small accuracy degradation, and this paper presents the detailed hardware implementation of block circulant matrices on CONV layers and a structure supporting the heterogeneous weight quantization. Expand
Hardware-driven nonlinear activation for stochastic computing based deep convolutional neural networks
This paper designs and optimize SC based neurons, and proposes highly accurate activation designs for the three most frequently used activation functions in software DCNNs, i.e, hyperbolic tangent, logistic, and rectified linear units. Expand
FFT-based deep learning deployment in embedded systems
This work proposes a Fast Fourier Transform-based DNN training and inference model suitable for embedded platforms with reduced asymptotic complexity of both computation and storage, and develops and deploys the FFT-based inference model on embedded platforms achieving extraordinary processing speed. Expand
FTRANS: energy-efficient acceleration of transformers using FPGA
This paper proposes an efficient acceleration framework, Ftrans, for transformer-based large scale language representations, which includes enhanced block-circulant matrix (BCM)-based weight representation to enable model compression on large-scale language representations at the algorithm level with few accuracy degradation. Expand