Real-time semantic segmentation on FPGAs for autonomous vehicles with hls4ml

@article{Ghielmetti2022RealtimeSS,
  title={Real-time semantic segmentation on FPGAs for autonomous vehicles with hls4ml},
  author={Nicol{\`o} Ghielmetti and Vladimir Loncar and Maurizio Pierini and Marcel Roed and Sioni Summers and Thea Klaeboe Aarrestad and Christoffer Petersson and Hampus Linander and Jennifer Ngadiuba and Kelvin Lin and Philip C. Harris},
  journal={ArXiv},
  year={2022},
  volume={abs/2205.07690}
}
In this paper, we investigate how field programmable gate arrays can serve as hardware accelerators for real-time semantic segmentation tasks relevant for autonomous driving. Considering compressed versions of the ENet convolutional neural network architecture, we demonstrate a fully-on-chip deployment with a latency of 4.9 ms per image, using less than 30% of the available resources on a Xilinx ZCU102 evaluation board. The latency is reduced to 3 ms per image when increasing the batch size to… 

Figures and Tables from this paper

References

SHOWING 1-10 OF 22 REFERENCES
Design and Implementation of Real-time Semantic Segmentation Network Based on FPGA
TLDR
A lightweight semantically segmented neural network Efficient neural network (E-Net) is designed and implemented on the image acquisition board with Zynq 7035 FPGA as processing unit, which meets the requirements of real-time processing.
ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation
TLDR
A novel deep neural network architecture named ENet (efficient neural network), created specifically for tasks requiring low latency operation, which is up to 18 times faster, requires 75% less FLOPs, has 79% less parameters, and provides similar or better accuracy to existing models.
Model compression and simplification pipelines for fast deep neural network inference in FPGAs in HEP
TLDR
A multi-stage compression approach based on conventional compression strategies (pruning and quantization) to reduce the memory footprint of the model and knowledge transfer techniques, crucial to streamline the DNNs simplifying the synthesis phase in the FPGA firmware and improving explainability is presented.
Compressing deep neural networks on FPGAs to binary and ternary precision with hls4ml
We present the implementation of binary and ternary neural networks in the hls4ml library, designed to automatically convert deep neural network models to digital circuits with field-programmable
Distance-Weighted Graph Neural Networks on FPGAs for Real-Time Particle Reconstruction in High Energy Physics
TLDR
A representative task associated to particle reconstruction and identification in a next-generation calorimeter operating at a particle collider is considered, and a graph network architecture developed for such purposes is used, and additional simplifications to match the computing constraints of Level-1 trigger systems are applied.
Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation
TLDR
This paper proposes a simple and scalable detection algorithm that improves mean average precision (mAP) by more than 30% relative to the previous best result on VOC 2012 -- achieving a mAP of 53.3%.
Accelerated Charged Particle Tracking with Graph Neural Networks on FPGAs
TLDR
A considerable speedup over CPU-based execution is possible, potentially enabling such algorithms to be used effectively in future computing workflows and the FPGA-based Level-1 trigger at the CERN Large Hadron Collider.
hls4ml: An Open-Source Codesign Workflow to Empower Scientific Low-Power Machine Learning Devices
TLDR
Hls4ml, an open-source software-hardware co-design workflow to interpret and translate machine learning algorithms for implementation in FPGAs and ASICs specifically to support domain scientists, is developed.
The Cityscapes Dataset for Semantic Urban Scene Understanding
TLDR
This work introduces Cityscapes, a benchmark suite and large-scale dataset to train and test approaches for pixel-level and instance-level semantic labeling, and exceeds previous attempts in terms of dataset size, annotation richness, scene variability, and complexity.
Fast inference of deep neural networks in FPGAs for particle physics
TLDR
A case study for neural network inference in FPGAs focusing on a classifier for jet substructure which would enable, among many other physics scenarios, searches for new dark sector particles and novel measurements of the Higgs boson.
...
...