• Corpus ID: 46895963

CascadeCNN: Pushing the performance limits of quantisation

  title={CascadeCNN: Pushing the performance limits of quantisation},
  author={Alexandros Kouris and Stylianos I. Venieris and Christos-Savvas Bouganis},
This work presents CascadeCNN, an automated toolflow that pushes the quantisation limits of any given CNN model, to perform high-throughput inference by exploiting the computation time-accuracy trade-off. Without the need for retraining, a two-stage architecture tailored for any given FPGA device is generated, consisting of a low- and a high-precision unit. A confidence evaluation unit is employed between them to identify misclassified cases at run time and forward them to the high-precision… 

Figures from this paper

Caffe Barista: Brewing Caffe with FPGAs in the Training Loop

Barista is presented, an automated toolflow that provides seamless integration of FPGAs into the training of CNNs with the popular deep learning framework Caffe, providing the necessary infrastructure for further research and development.

Towards Efficient On-Board Deployment of DNNs on Intelligent Autonomous Systems

An overview of recent methods and hardware architectures that address the system-level challenges of modern DNN-enabled autonomous systems at both the algorithmic and hardware design level are presented.

Approximate LSTMs for Time-Constrained Inference: Enabling Fast Reaction in Self-Driving Cars

A progressive inference computing scheme that combines model pruning and computation restructuring leading to the best possible approximation of the result given the available latency budget of the target application is introduced.

MobiSR: Efficient On-Device Super-Resolution through Heterogeneous Mobile Processors

This work presents MobiSR, a novel framework for performing efficient super-resolution on-device, which considers popular model compression techniques and traverses the design space to reach the highest performing trade-off between image quality and processing speed.

Deploying Deep Neural Networks in the Embedded Space

This paper summarises recent work on the optimised mapping of DNNs on embedded settings, covering such diverse topics as DNN-to-accelerator toolflows, high-throughput cascaded classifiers and domain-specific model design to enable the deployment of sophisticated deep learning models on cutting-edge mobile and embedded systems.



Going deeper with convolutions

We propose a deep convolutional neural network architecture codenamed Inception that achieves the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition

Very Deep Convolutional Networks for Large-Scale Image Recognition

This work investigates the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting using an architecture with very small convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers.

Fixed Point Quantization of Deep Convolutional Networks

This paper proposes a quantizer design for fixed point implementation of DCNs, formulate and solve an optimization problem to identify optimal fixed point bit-width allocation across DCN layers, and demonstrates that fine-tuning can further enhance the accuracy of fixed point DCNs beyond that of the original floating point model.

Quantized Convolutional Neural Networks for Mobile Devices

This paper proposes an efficient framework, namely Quantized CNN, to simultaneously speed-up the computation and reduce the storage and memory overhead of CNN models.

fpgaConvNet: A Framework for Mapping Convolutional Neural Networks on FPGAs

Convolutional Neural Networks (ConvNets) are a powerful Deep Learning model, providing state-of-the-art accuracy to many emerging classification problems. However, ConvNet classification is a

ImageNet classification with deep convolutional neural networks

A large, deep convolutional neural network was trained to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes and employed a recently developed regularization method called "dropout" that proved to be very effective.

Visualizing and Understanding Convolutional Networks

A novel visualization technique is introduced that gives insight into the function of intermediate feature layers and the operation of the classifier in large Convolutional Network models, used in a diagnostic role to find model architectures that outperform Krizhevsky et al on the ImageNet classification benchmark.

Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding

This work introduces "deep compression", a three stage pipeline: pruning, trained quantization and Huffman coding, that work together to reduce the storage requirement of neural networks by 35x to 49x without affecting their accuracy.

fpgaConvNet: A Toolflow for Mapping Diverse Convolutional Neural Networks on Embedded FPGAs

In recent years, Convolutional Neural Networks (ConvNets) have become an enabling technology for a wide range of novel embedded Artificial Intelligence systems. Across the range of applications, the

Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations

A binary matrix multiplication GPU kernel is programmed with which it is possible to run the MNIST QNN 7 times faster than with an unoptimized GPU kernel, without suffering any loss in classification accuracy.