Fast convolutional neural networks on FPGAs with hls4ml

@article{Aarrestad2021FastCN,
  title={Fast convolutional neural networks on FPGAs with hls4ml},
  author={Thea Klaeboe Aarrestad and Vladimir Loncar and Nicol{\`o} Ghielmetti and Maurizio Pierini and Sioni Summers and Jennifer Ngadiuba and Christoffer Petersson and Hampus Linander and Yutaro Iiyama and Giuseppe Di Guglielmo and Javier Mauricio Duarte and Philip C. Harris and Dylan S. Rankin and Sergo Jindariani and Kevin Pedro and Nhan Viet Tran and Miaoyuan Liu and Edward Kreinar and Zhenbin Wu and Duc Hoang},
  journal={Machine Learning: Science and Technology},
  year={2021},
  volume={2}
}
We introduce an automated tool for deploying ultra low-latency, low-power deep neural networks with convolutional layers on field-programmable gate arrays (FPGAs). By extending the hls4ml library, we demonstrate an inference latency of 5 µs using convolutional architectures, targeting microsecond latency applications like those at the CERN Large Hadron Collider. Considering benchmark models trained on the Street View House Numbers Dataset, we demonstrate various methods for model compression in… 

De-specializing an HLS library for Deep Neural Networks: improvements upon hls4ml

Custom hardware accelerators for Deep Neural Networks are increasingly popular: in fact, the flexibility and performance offered by FPGAs are well-suited to the computational effort and low latency

hls4ml: An Open-Source Codesign Workflow to Empower Scientific Low-Power Machine Learning Devices

Hls4ml, an open-source software-hardware co-design workflow to interpret and translate machine learning algorithms for implementation in FPGAs and ASICs specifically to support domain scientists, is developed.

Nanosecond machine learning event classification with boosted decision trees in FPGA for high energy physics

This work presents a novel implementation of classification using the machine learning/artificial intelligence method called boosted decision trees (BDT) on field programmable gate arrays (FPGA) and aims to provide decisions at the lowest latency values for real-time event classification.

FPGA-Based Acceleration of Convolutional Neural Network for Gesture Recognition Using mm-Wave FMCW Radar

This study uses CNNs to classify hand gestures obtained using mmWave Frequency-Modulated Continuous Wave radar, and discusses the complete workflow from radar data preprocessing and model compression (quantization and pruning) to the final implementation of the CNN accelerator on FPGAs.

Graph Neural Networks for Charged Particle Tracking on FPGAs

An automated translation workflow is introduced, integrated into a broader tool called hls4ml, for converting GNNs into firmware for field-programmable gate arrays (FPGAs), which could enable the inclusion of charged particle tracking Gnns at the trigger level for HL-LHC experiments.

Performance Comparison of Generic and Quantized Fully Connected and Convolutional Neural Networks for Real- Time Signal/Background Classification

A comparison of fully connected and convolutional NNs used for the potential real-time signal/background classification method shows that convolutionAL models slightly outperform fully connected architectures in both generic and quantized cases.

Data-Model-Hardware Tri-Design for Energy-Efficient Video Intelligence

A data-model-hardware tri-design frame-work for high-throughput, low-cost, and high-accuracymulti-objecttracking(MOT) onHigh-Definition(HD)videostream on high-definition TVream is proposed.

Data-Model-Circuit Tri-Design for Ultra-Light Video Intelligence on Edge Devices

A data-model-hardware tri-design frame-work for high-throughput, low-cost, and high-accuracymulti-objecttracking(MOT) onHigh-Definition(HD)videostream on high-definition TVream is proposed.

A Simplified Correlation Index for Fast Real-Time Pulse Shape Recognition

A simplified correlation index is proposed to be used in real-time pulse shape recognition systems and it can be efficiently implemented in FPGA devices with far fewer logic resources and excellent performance.

FastStamp: Accelerating Neural Steganography and Digital Watermarking of Images on FPGAs

This work proposes a parameter efficient DNN model for embedding recoverable bit-strings in image pixels that can match the success metrics of prior state-of-the-art DNN based watermarking methods while being significantly faster and lighter in terms of memory footprint.

References

SHOWING 1-10 OF 70 REFERENCES

Compressing deep neural networks on FPGAs to binary and ternary precision with hls4ml

We present the implementation of binary and ternary neural networks in the hls4ml library, designed to automatically convert deep neural network models to digital circuits with field-programmable

FINN: A Framework for Fast, Scalable Binarized Neural Network Inference

FINN, a framework for building fast and flexible FPGA accelerators using a flexible heterogeneous streaming architecture that implements fully connected, convolutional and pooling layers, with per-layer compute resources being tailored to user-provided throughput requirements is presented.

FP-DNN: An Automated Framework for Mapping Deep Neural Networks onto FPGAs with RTL-HLS Hybrid Templates

  • Yijin GuanHao Liang J. Cong
  • Computer Science
    2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)
  • 2017
FP-DNN (Field Programmable DNN), an end-to-end framework that takes TensorFlow-described DNNs as input, and automatically generates the hardware implementations on FPGA boards with RTL-HLS hybrid templates, is proposed.

fpgaConvNet: A Framework for Mapping Convolutional Neural Networks on FPGAs

Convolutional Neural Networks (ConvNets) are a powerful Deep Learning model, providing state-of-the-art accuracy to many emerging classification problems. However, ConvNet classification is a

Efficient FPGA acceleration of Convolutional Neural Networks using logical-3D compute array

This paper presents a flexible yet highly efficient 3D neuron array architecture that is a natural fit for convolutional layers and presents the technique to optimize its parameters including on-chip buffer sizes for a given set of resource constraint for modern FPGAs.

Toolflows for Mapping Convolutional Neural Networks on FPGAs

A survey of the existing CNN-to-FPGA toolflows is presented, comprising a comparative study of their key characteristics, which include the supported applications, architectural choices, design space exploration methods, and achieved performance.

From high-level deep neural models to FPGAs

DnnWeaver is devised, a framework that automatically generates a synthesizable accelerator for a given DNN, FPGA pair from a high-level specification in Caffe that best matches the needs of the DNN while providing high performance and efficiency gains for the target FPGAs.

fpgaConvNet: Automated Mapping of Convolutional Neural Networks on FPGAs (Abstract Only)

In recent years, Convolutional Neural Networks (ConvNets) have become the state-of-the-art in several Artificial Intelligence tasks. Across the range of applications, the performance needs vary

Latency-driven design for FPGA-based convolutional neural networks

A latency-driven design methodology for mapping ConvNets on FPGAs that employs novel transformations over a Synchronous Dataflow-based modelling framework together with a latency-centric optimisation procedure in order to efficiently explore the design space targeting low-latency designs.

Caffeinated FPGAs: FPGA framework For Convolutional Neural Networks

This work presents a modified version of the popular CNN framework Caffe, with FPGA support, which allows for classification using CNN models and specialized FPN implementations with the flexibility of reprogramming the device when necessary, seamless memory transactions between host and device, simple-to-use test benches, and the ability to create pipelined layer implementations.
...