RapidLayout: Fast Hard Block Placement of FPGA-Optimized Systolic Arrays using Evolutionary Algorithms

@article{Zhang2020RapidLayoutFH,
  title={RapidLayout: Fast Hard Block Placement of FPGA-Optimized Systolic Arrays using Evolutionary Algorithms},
  author={Niansong Zhang and Xiang Chen and Nachiket Kapre},
  journal={2020 30th International Conference on Field-Programmable Logic and Applications (FPL)},
  year={2020},
  pages={145-152}
}
  • Niansong Zhang, Xiang Chen, N. Kapre
  • Published 17 February 2020
  • Computer Science
  • 2020 30th International Conference on Field-Programmable Logic and Applications (FPL)
Evolutionary algorithms can outperform conventional placement algorithms such as simulated annealing, analytical placement as well as manual placement on metrics such as runtime, wirelength, pipelining cost, and clock frequency when mapping FPGA hard block intensive designs such as systolic arrays on Xilinx UltraScale+ FPGAs. For certain hard-block intensive, systolic array accelerator designs, the commercial-grade Xilinx Vivado CAD tool is unable to provide a legal routing solution without… 
How Much Does Regularity Help FPGA Placement?
TLDR
This work proposes a regularity-aware approach to FPGA placement exploiting design regularity that achieves 2X to 28X speed up versus Versatile Place and Route (VPR) with limited circuit performance loss.
Reduced-Precision Acceleration of Radio-Astronomical Imaging on Reconfigurable Hardware
TLDR
A reduced-precision implementation of the gridding component of the widely-used WSClean imaging application and proposes the first custom floating-point accelerator on a Xilinx Alveo U50 FPGA using High-Level Synthesis.
RapidStream: Parallel Physical Implementation of FPGA HLS Designs
TLDR
This paper proposes a split compilation approach based on the pipelining flexibility at the HLS level, which allows for partition designs for parallel placement and routing then stitch the separate partitions together to achieve a fast end-to-end compilation.
Mocarabe: High-Performance Time-Multiplexed Overlays for FPGAs
TLDR
This work treats data movement as a first-class citizen by encoding the space and time resources communication network of the overlay in the ILP formulation and outperforms a PathFinder space-time router implementation in quality of result by up to nearly 2×.

References

SHOWING 1-10 OF 42 REFERENCES
RapidWright: Enabling Custom Crafted Implementations for FPGAs
  • C. Lavin, A. Kaviani
  • Computer Science
    2018 IEEE 26th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)
  • 2018
TLDR
This work proposes a pre-implemented methodology for FPGAs to achieve higher performance or productivity and introduces RapidWright, an open-source platform to enable this new approach.
UTPlaceF: A Routability-Driven FPGA Placer With Physical and Congestion Aware Packing
  • Wuxi Li, Shounak Dhar, D. Pan
  • Computer Science
    IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
  • 2018
TLDR
An FPGA packing and placement engine called UTPlaceF that simultaneously optimizes wirelength and routability and a novel physical and congestion aware packing algorithm and a hierarchical detailed placement technique are proposed.
VTR 7.0: Next Generation Architecture and CAD System for FPGAs
TLDR
Recent advances in the open source Verilog-to-Routing (VTR) CAD flow are described that enable further research in these areas and release new FPGA architecture files and models that are much closer to modern commercial architectures, enabling more realistic experiments.
Scaling the Cascades: Interconnect-Aware FPGA Implementation of Machine Learning Problems
TLDR
This work refor-mulate convolution and matrix-vector multiplication operations to make effective use of cascade interconnect in DSP48s for supporting the common multiply-accumulate chains, and in BRAMs, and URAMs to exploit the data movement and reuse patterns of ML workloads.
Opt4J: a modular framework for meta-heuristic optimization
TLDR
A modular framework for meta-heuristic optimization of complex optimization tasks by decomposing them into subtasks that may be designed and developed separately by enabling a maximal decoupling and flexibility.
1.1 The Deep Learning Revolution and Its Implications for Computer Architecture and Chip Design
  • J. Dean
  • Computer Science
    2020 IEEE International Solid- State Circuits Conference - (ISSCC)
  • 2020
TLDR
This paper provides a sketch of at least one interesting direction towards much larger-scale multi-task models that are sparsely activated and employ much more dynamic, exampleand task-based routing than the machine learning models of today.
Solving NP hard Problems using Genetic Algorithm
TLDR
This paper represents how to find optimal solution using various method of genetic algorithm, an iterative search, optimization and adaptive machine learning technique premised on the principles of Natural selection.
Deep Reinforcement Learning for Multiobjective Optimization
TLDR
The proposed DRL-MOA method provides a new way of solving the MOP by means of DRL that has shown a set of new characteristics, for example, strong generalization ability and fast solving speed in comparison with the existing methods for multiobjective optimizations.
NSGA-Net: neural architecture search using multi-objective genetic algorithm
TLDR
Experimental results suggest that combining the dual objectives of minimizing an error metric and computational complexity, as measured by FLOPs, allows NSGA-Net to find competitive neural architectures.
Compute-Efficient Neural-Network Acceleration
TLDR
This work outlines a convolutional neural network accelerator that operates at 92.9% of the peak FPGA clock rate, and maps neural-network operators to a minimalist hardware architecture to simplify data movement between memory and compute.
...
...