Data-Driven Offline Optimization For Architecting Hardware Accelerators
@article{Kumar2021DataDrivenOO, title={Data-Driven Offline Optimization For Architecting Hardware Accelerators}, author={Aviral Kumar and Amir Yazdanbakhsh and Milad Hashemi and Kevin Swersky and Sergey Levine}, journal={ArXiv}, year={2021}, volume={abs/2110.11346} }
Industry has gradually moved towards application-specific hardware accelerators in order to attain higher efficiency. While such a paradigm shift is already starting to show promising results, designers need to spend considerable manual effort and perform a large number of time-consuming simulations to find accelerators that can accelerate multiple target applications while obeying design constraints. Moreover, such a “simulation-driven” approach must be re-run from scratch every time the set…
Figures and Tables from this paper
References
SHOWING 1-10 OF 67 REFERENCES
Automated Accelerator Generation and Optimization with Composable, Parallel and Pipeline Architecture
- Computer Science2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC)
- 2018
The composable, parallel and pipeline (CPP) microarchitecture is proposed as an accelerator design template to substantially reduce the design space and the AutoAccel framework is developed to automate the entire accelerator generation process.
Timeloop: A Systematic Approach to DNN Accelerator Evaluation
- Computer Science2019 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)
- 2019
Timeloop's underlying models and algorithms are described in detail and results from case studies enabled by Timeloop are shown, which reveal that dataflow and memory hierarchy co-design plays a critical role in optimizing energy efficiency.
Spatial: a language and compiler for application accelerators
- Computer ScienceProceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation
- 2018
This work describes a new domain-specific language and compiler called Spatial for higher level descriptions of application accelerators, and summarizes the compiler passes required to support these abstractions, including pipeline scheduling, automatic memory banking, and automated design tuning driven by active machine learning.
Understanding Reuse, Performance, and Hardware Cost of DNN Dataflow: A Data-Centric Approach
- Computer ScienceMICRO
- 2019
This work introduces a set of data-centric directives to concisely specify the DNN dataflow space in a compiler-friendly form and codifies this analysis into an analytical cost model, MAESTRO (Modeling Accelerator Efficiency via Patio-Temporal Reuse and Occupancy), that estimates various cost-benefit tradeoffs of a dataflow including execution time and energy efficiency for a DNN model and hardware configuration.
Apollo: Transferable Architecture Exploration
- Computer ScienceArXiv
- 2021
This work proposes a transferable architecture exploration framework, dubbed APOLLO, that leverages recent advances in black-box function optimization for sample-efficient accelerator design and uses this framework to optimize accelerator configurations of a diverse set of neural architectures with alternative design constraints.
ConfuciuX: Autonomous Hardware Resource Assignment for DNN Accelerators using Reinforcement Learning
- Computer Science2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)
- 2020
Con-fuciuX demonstrates the highest sample-efficiency for training compared to other techniques such as Bayesian optimization, genetic algorithm, simulated annealing, and other RL methods, and converges to the optimized hardware configuration 4.7 to 24 times faster than alternate techniques.
MARVEL: A Decoupled Model-driven Approach for Efficiently Mapping Convolutions on Spatial DNN Accelerators
- Computer ScienceArXiv
- 2020
This work proposes a decoupled off-chip/on-chip approach that decomposes the mapping space into off- chip and on-chip subspaces, and first optimizes the off- Chip subspace followed by the on- chip subspace, and considers dimension permutation, a form of data-layouts, in the mappingspace formulation along with the loop transformations.
Learned Hardware/Software Co-Design of Neural Accelerators
- Computer ScienceArXiv
- 2020
This paper proposes a new constrained Bayesian optimization framework that avoids invalid solutions by exploiting the highly constrained features of this design space, which are semi-continuous/semi-discrete.
dMazeRunner: Executing Perfectly Nested Loops on Dataflow Accelerators
- Computer ScienceACM Trans. Embed. Comput. Syst.
- 2019
dMazeRunner is proposed -- to efficiently and accurately explore the vast space of the different ways to spatiotemporally execute a perfectly nested loop on dataflow accelerators (execution methods) and demonstrate that the solutions discovered by dMaze runner are on average 9.16× better in Energy-Delay-Product (EDP) and 5.83 × better in execution time, as compared to prior approaches.
A case for efficient accelerator design space exploration via Bayesian optimization
- Computer Science2017 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED)
- 2017
This paper shows how to adapt multi-objective Bayesian optimization to overcome a challenging design problem: optimizing deep neural network hardware accelerators for both accuracy and energy efficiency.