DREAMPlace: Deep Learning Toolkit-Enabled GPU Acceleration for Modern VLSI Placement

@article{Lin2021DREAMPlaceDL,
  title={DREAMPlace: Deep Learning Toolkit-Enabled GPU Acceleration for Modern VLSI Placement},
  author={Yibo Lin and Zixuan Jiang and Jiaqi Gu and Wuxi Li and Shounak Dhar and Haoxing Ren and Brucek Khailany and David Z. Pan},
  journal={IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems},
  year={2021},
  volume={40},
  pages={748-761}
}
  • Yibo Lin, Zixuan Jiang, +5 authors D. Pan
  • Published 22 June 2020
  • Computer Science
  • IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Placement for very large-scale integrated (VLSI) circuits is one of the most important steps for design closure. We propose a novel GPU-accelerated placement framework DREAMPlace, by casting the analytical placement problem equivalently to training a neural network. Implemented on top of a widely adopted deep learning toolkit <monospace>PyTorch</monospace>, with customized key kernels for wirelength and density computations, DREAMPlace can achieve around <inline-formula> <tex-math notation… Expand
DREAMPlace 2.0: Open-Source GPU-Accelerated Global and Detailed Placement for Large-Scale VLSI Designs
Modern backend design flow for very-large-scale-integrated (VLSI) circuits consists of many complicated stages and requires long turn-around time. Among these stages, VLSI placement plays aExpand
CU.POKer: Placing DNNs on Wafer-Scale Al Accelerator with Optimal Kernel Sizing
TLDR
CU.POKer is proposed, a high-performance engine fully-customized for WSE's DNN workload placement challenge, with a provably optimal placeable kernel candidate searching scheme and a data-flow-aware placement tool to ensure the state-of-the-art quality on the real industrial benchmarks. Expand
ABCDPlace: Accelerated Batch-Based Concurrent Detailed Placement on Multithreaded CPUs and GPUs
TLDR
This article presents a concurrent detailed placement framework, ABCDPlace, exploiting multithreading and graphic processing unit (GPU) acceleration and proposes batch-based concurrent algorithms for widely adopted sequential detailed placement techniques, such as independent set matching, global swap, and local reordering. Expand
Opportunities for RTL and Gate Level Simulation using GPUs (Invited Talk)
TLDR
The idea that coding frameworks usually used for popular machine learning topics, such as PyTorch/DGL.ai, can also be used for exploring simulation purposes, and a crude oblivious two-value cycle gate-level simulator is demo that exhibits >20X speedup, despite its simplistic construction. Expand
On Joint Learning for Solving Placement and Routing in Chip Design
  • Ruoyu Cheng, Junchi Yan
  • Computer Science
  • ArXiv
  • 2021
TLDR
A joint learning method termed by DeepPlace is proposed for the placement of macros and standard cells, by the integration of reinforcement learning with a gradient based optimization scheme and a joint learning approach via reinforcement learning to fulfill both macro placement and routing, which is called DeepPR. Expand
GPU Acceleration in VLSI Back-end Design: Overview and Case Studies
  • Yibo Lin
  • Computer Science
  • 2020 IEEE/ACM International Conference On Computer Aided Design (ICCAD)
  • 2020
TLDR
This tutorial summarizes the challenges in the key design stages such as placement, routing, and timing anaylsis, and provides several case studies on how to enable massive parallelism in practice. Expand
GPU-Accelerated Static Timing Analysis
TLDR
This paper develops GPU-efficient data structures and high-performance kernels to speed up various tasks of STA including levelization, delay calculation, and graph update and proposes an efficient implementation for accelerating STA on a GPU. Expand
Global Placement with Deep Learning-Enabled Explicit Routability Optimization
TLDR
A fully convolutional network model is proposed to predict congestion hotspots and then incorporate this prediction model into a placement engine, DREAMPlace, to get a more route-friendly result. Expand
VLSI Placement Optimization using Graph Neural Networks
Placement is one of the most crucial problems in modern Electronic Design Automation (EDA) flows, where the solution quality is mainly dominated by on-chip interconnects. To achieve target closures,Expand
DREAMPlace 3.0: Multi-Electrostatics Based Robust VLSI Placement with Region Constraints
TLDR
This work proposes a versatile and robust placer to solve region-constrained placement problems with better solution quality and faster convergence and adopts self-adaptive quadratic density penalty and entropy injection techniques to automatically accelerate and stabilize the nonlinear optimization. Expand
...
1
2
3
4
...

References

SHOWING 1-10 OF 39 REFERENCES
DREAMPIace: Deep Learning Toolkit-Enabled GPU Acceleration for Modern VLSI Placement
TLDR
A novel GPU-accelerated placement framework DREAMPlace is proposed, by casting the analytical placement problem equivalently to training a neural network, to achieve over 30 times speedup in global placement without quality degradation compared to the state-of-the-art multi-threaded placer RePlAce. Expand
ABCDPlace: Accelerated Batch-Based Concurrent Detailed Placement on Multithreaded CPUs and GPUs
TLDR
This article presents a concurrent detailed placement framework, ABCDPlace, exploiting multithreading and graphic processing unit (GPU) acceleration and proposes batch-based concurrent algorithms for widely adopted sequential detailed placement techniques, such as independent set matching, global swap, and local reordering. Expand
PyTorch: An Imperative Style, High-Performance Deep Learning Library
TLDR
This paper details the principles that drove the implementation of PyTorch and how they are reflected in its architecture, and explains how the careful and pragmatic implementation of the key components of its runtime enables them to work together to achieve compelling performance. Expand
Parallel multi-level analytical global placement on graphics processing units
  • J. Cong, Yi Zou
  • Computer Science
  • 2009 IEEE/ACM International Conference on Computer-Aided Design - Digest of Technical Papers
  • 2009
TLDR
This paper describes the implementation of a state-of-the-art academic multi-level analytical placer mPL on Nvidia's massively parallel GT200 series platforms, and details the efforts on performance tuning and optimizations. Expand
GDP: GPU accelerated Detailed Placement
  • S. Dhar, D. Pan
  • Computer Science
  • 2018 IEEE High Performance extreme Computing Conference (HPEC)
  • 2018
TLDR
This paper demonstrates GPU acceleration of a dynamic programming based detailed placement algorithm which solves a generalized version of the Linear Arrangement Problem and achieves upto 7x speedup in runtime over multi-threaded CPU implementation without any loss of QoR. Expand
Accelerate analytical placement with GPU: A generic approach
  • Chun-Xun Lin, M. Wong
  • Computer Science
  • 2018 Design, Automation & Test in Europe Conference & Exhibition (DATE)
  • 2018
TLDR
A generic approach of exploiting GPU parallelism to speed up the essential computations in VLSI nonlinear analytical placement by utilizing the sparse characteristic of circuit graph to transform the compute-intensive portions into sparse matrix multiplications, which effectively optimizes the memory access pattern and mitigates the imbalance workload. Expand
UTPlaceF 3.0: A parallelization framework for modern FPGA global placement: (Invited paper)
TLDR
A parallelization framework for modern FPGA global placement, UTPlaceF 3.0 is proposed and two major techniques are presented to boost the performance of a state-of-the-art quadratic placer with only small quality degradation. Expand
High-quality, deterministic parallel placement for FPGAs on commodity hardware
TLDR
This paper describes the application of two parallelization strategies to the Quartus II FPGA placer, and describes a process to quantify multi-core performance effects, such as memory subsystem limitations and explicit synchronization overhead, and fully describe these effects on a CAD tool for the first time. Expand
MAPLE: multilevel adaptive placement for mixed-size designs
We propose a new multilevel framework for large-scale placement called MAPLE that respects utilization constraints, handles movable macros and guides the transition between global and detailedExpand
elfPlace: Electrostatics-based Placement for Large-Scale Heterogeneous FPGAs
  • Wuxi Li, Yibo Lin, D. Pan
  • Computer Science
  • 2019 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)
  • 2019
TLDR
Besides pure-wirelength minimization, this work proposes a unified instance area adjustment scheme to simultaneously optimize routability, pin density, and downstream clustering compatibility and an augmented Lagrangian formulation together with a preconditioning technique and a normalized subgradient-based multiplier updating scheme are proposed. Expand
...
1
2
3
4
...