# Deep Convolutional Neural Network Inference with Floating-point Weights and Fixed-point Activations

@article{Lai2017DeepCN, title={Deep Convolutional Neural Network Inference with Floating-point Weights and Fixed-point Activations}, author={Liangzhen Lai and Naveen Suda and Vikas Chandra}, journal={ArXiv}, year={2017}, volume={abs/1703.03073} }

Deep convolutional neural network (CNN) inference requires significant amount of memory and computation, which limits its deployment on embedded devices. [] Key Method We show that using floating-point representation for weights is more efficient than fixed-point representation for the same bit-width and demonstrate it on popular large-scale CNNs such as AlexNet, SqueezeNet, GoogLeNet and VGG-16. We also show that such a representation scheme enables compact hardware multiply-and-accumulate (MAC) unit design…

## 77 Citations

### Short floating-point representation for convolutional neural network inference

- Computer ScienceIEICE Electron. Express
- 2019

The experimental results show that the short floating-point representation with 8-bit total width achieves less-than-1-percentage-point degradation without the aid of retraining in the top-5 accuracy on very deep CNNs of up to 152 layers and gives more than a 60% area reduction in the ASIC implementation.

### Phoenix: A Low-Precision Floating-Point Quantization Oriented Architecture for Convolutional Neural Networks

- Computer Science
- 2020

A normalization-oriented 8-bit floating-point quantization oriented processor, named Phoenix, is proposed to reduce storage and memory access with negligible accuracy loss and a hardware processor is designed to address the hardware inefficiency caused byfloating-point multiplier.

### Quantization of deep neural networks for accumulator-constrained processors

- Computer ScienceMicroprocess. Microsystems
- 2020

### Deep Neural Network Approximation for Custom Hardware

- Computer ScienceACM Comput. Surv.
- 2019

This article provides a comprehensive evaluation of approximation methods for high-performance network inference along with in-depth discussion of their effectiveness for custom hardware implementation and includes proposals for future research based on a thorough analysis of current trends.

### Quantization of constrained processor data paths applied to convolutional neural networks

- Computer Science
- 2018

A layer-wise quantization heuristic to find a good fixed-point network approximation for platforms without wide accumulation registers is proposed and it is demonstrated that 16-bit accumulators are able to obtain a Top-1 classification accuracy within 1% of the floating-point baselines on the CIFAR10 and ILSVRC2012 image classification benchmarks.

### Quantization of Constrained Processor Data Paths Applied to Convolutional Neural Networks

- Computer Science2018 21st Euromicro Conference on Digital System Design (DSD)
- 2018

A layer-wise quantization heuristic to find a good fixed-point network approximation for platforms without wide accumulation registers is proposed and it is demonstrated that 16-bit accumulators are able to obtain a Top-1 classification accuracy within 1% of the floating-point baselines on the CIFAR-10 and ILSVRC2012 image classification benchmarks.

### Low Precision Floating Point Arithmetic for High Performance FPGA-based CNN Acceleration

- Computer ScienceFPGA
- 2020

To the best of the knowledge, this is the first in-depth study to simplify one multiplication for CNN inference to one 4-bit MAC and implement four multiplications within one DSP while maintaining comparable accuracy without any re-training.

### An Energy-Efficient Sparse Deep-Neural-Network Learning Accelerator With Fine-Grained Mixed Precision of FP8–FP16

- Computer ScienceIEEE Solid-State Circuits Letters
- 2019

This letter presents an energy-efficient DNN learning accelerator core supporting CNN and FC learning as well as inference with following three key features: 1) fine-grained mixed precision (FGMP); 2) compressed sparse DNNLearning/inference; and 3) input load balancer.

### Zero-Centered Fixed-Point Quantization With Iterative Retraining for Deep Convolutional Neural Network-Based Object Detectors

- Computer ScienceIEEE Access
- 2021

In the proposed method, the center of the weight distribution is adjusted to zero by subtracting the mean of weight parameters before quantization, and the retraining process is iteratively applied to minimize the accuracy drop caused by quantization.

### A Variable Precision Approach for Deep Neural Networks

- Computer Science2019 International Conference on Advanced Technologies for Communications (ATC)
- 2019

The thesis investigates a hardware implementation of multiply-and-add with variable bit precision which can be adjusted at the computation time, and shows that the proposed system can achieve the accuracy up of to 88%.

## References

SHOWING 1-10 OF 30 REFERENCES

### Fixed Point Quantization of Deep Convolutional Networks

- Computer ScienceICML
- 2016

This paper proposes a quantizer design for fixed point implementation of DCNs, formulate and solve an optimization problem to identify optimal fixed point bit-width allocation across DCN layers, and demonstrates that fine-tuning can further enhance the accuracy of fixed point DCNs beyond that of the original floating point model.

### Reduced-Precision Strategies for Bounded Memory in Deep Neural Nets

- Computer ScienceArXiv
- 2015

This work investigates how using reduced precision data in Convolutional Neural Networks affects network accuracy during classification and proposes a method for finding a low precision configuration for a network while maintaining high accuracy.

### Hardware-oriented Approximation of Convolutional Neural Networks

- Computer ScienceArXiv
- 2016

Ristretto is a model approximation framework that analyzes a given CNN with respect to numerical resolution used in representing weights and outputs of convolutional and fully connected layers and can condense models by using fixed point arithmetic and representation instead of floating point.

### Training deep neural networks with low precision multiplications

- Computer Science
- 2014

It is found that very low precision is sufficient not just for running trained networks but also for training them, and it is possible to train Maxout networks with 10 bits multiplications.

### Ristretto: Hardware-Oriented Approximation of Convolutional Neural Networks

- Computer ScienceArXiv
- 2016

Ristretto is a fast and automated framework for CNN approximation which simulates the hardware arithmetic of a custom hardware accelerator, and can successfully condense CaffeNet and SqueezeNet to 8-bit.

### Deep Learning with Limited Numerical Precision

- Computer ScienceICML
- 2015

The results show that deep networks can be trained using only 16-bit wide fixed-point number representation when using stochastic rounding, and incur little to no degradation in the classification accuracy.

### Accelerating Deep Convolutional Networks using low-precision and sparsity

- Computer Science2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2017

This work achieves the highest reported accuracy with extremely low-precision (2-bit) weight networks and builds a deep learning accelerator core, DLAC, that can achieve up to 1 TFLOP/mm2 equivalent for single- Precision floating-point operations.

### Going Deeper with Embedded FPGA Platform for Convolutional Neural Network

- Computer ScienceFPGA
- 2016

This paper presents an in-depth analysis of state-of-the-art CNN models and shows that Convolutional layers are computational-centric and Fully-Connected layers are memory-centric, and proposes a CNN accelerator design on embedded FPGA for Image-Net large-scale image classification.

### Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding

- Computer ScienceICLR
- 2016

This work introduces "deep compression", a three stage pipeline: pruning, trained quantization and Huffman coding, that work together to reduce the storage requirement of neural networks by 35x to 49x without affecting their accuracy.

### Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1

- Computer Science
- 2016

A binary matrix multiplication GPU kernel is written with which it is possible to run the MNIST BNN 7 times faster than with an unoptimized GPU kernel, without suffering any loss in classification accuracy.