• Corpus ID: 239616399

Data-Driven Offline Optimization For Architecting Hardware Accelerators

  title={Data-Driven Offline Optimization For Architecting Hardware Accelerators},
  author={Aviral Kumar and Amir Yazdanbakhsh and Milad Hashemi and Kevin Swersky and Sergey Levine},
Industry has gradually moved towards application-specific hardware accelerators in order to attain higher efficiency. While such a paradigm shift is already starting to show promising results, designers need to spend considerable manual effort and perform a large number of time-consuming simulations to find accelerators that can accelerate multiple target applications while obeying design constraints. Moreover, such a “simulation-driven” approach must be re-run from scratch every time the set… 


Automated Accelerator Generation and Optimization with Composable, Parallel and Pipeline Architecture
The composable, parallel and pipeline (CPP) microarchitecture is proposed as an accelerator design template to substantially reduce the design space and the AutoAccel framework is developed to automate the entire accelerator generation process.
Timeloop: A Systematic Approach to DNN Accelerator Evaluation
Timeloop's underlying models and algorithms are described in detail and results from case studies enabled by Timeloop are shown, which reveal that dataflow and memory hierarchy co-design plays a critical role in optimizing energy efficiency.
Spatial: a language and compiler for application accelerators
This work describes a new domain-specific language and compiler called Spatial for higher level descriptions of application accelerators, and summarizes the compiler passes required to support these abstractions, including pipeline scheduling, automatic memory banking, and automated design tuning driven by active machine learning.
Understanding Reuse, Performance, and Hardware Cost of DNN Dataflow: A Data-Centric Approach
This work introduces a set of data-centric directives to concisely specify the DNN dataflow space in a compiler-friendly form and codifies this analysis into an analytical cost model, MAESTRO (Modeling Accelerator Efficiency via Patio-Temporal Reuse and Occupancy), that estimates various cost-benefit tradeoffs of a dataflow including execution time and energy efficiency for a DNN model and hardware configuration.
Apollo: Transferable Architecture Exploration
This work proposes a transferable architecture exploration framework, dubbed APOLLO, that leverages recent advances in black-box function optimization for sample-efficient accelerator design and uses this framework to optimize accelerator configurations of a diverse set of neural architectures with alternative design constraints.
ConfuciuX: Autonomous Hardware Resource Assignment for DNN Accelerators using Reinforcement Learning
Con-fuciuX demonstrates the highest sample-efficiency for training compared to other techniques such as Bayesian optimization, genetic algorithm, simulated annealing, and other RL methods, and converges to the optimized hardware configuration 4.7 to 24 times faster than alternate techniques.
MARVEL: A Decoupled Model-driven Approach for Efficiently Mapping Convolutions on Spatial DNN Accelerators
This work proposes a decoupled off-chip/on-chip approach that decomposes the mapping space into off- chip and on-chip subspaces, and first optimizes the off- Chip subspace followed by the on- chip subspace, and considers dimension permutation, a form of data-layouts, in the mappingspace formulation along with the loop transformations.
Learned Hardware/Software Co-Design of Neural Accelerators
This paper proposes a new constrained Bayesian optimization framework that avoids invalid solutions by exploiting the highly constrained features of this design space, which are semi-continuous/semi-discrete.
dMazeRunner: Executing Perfectly Nested Loops on Dataflow Accelerators
dMazeRunner is proposed -- to efficiently and accurately explore the vast space of the different ways to spatiotemporally execute a perfectly nested loop on dataflow accelerators (execution methods) and demonstrate that the solutions discovered by dMaze runner are on average 9.16× better in Energy-Delay-Product (EDP) and 5.83 × better in execution time, as compared to prior approaches.
Mind mappings: enabling efficient algorithm-accelerator mapping space search
Mind Mappings is proposed, a novel gradient-based search method for algorithm-accelerator mapping space search to derive a smooth, differentiable approximation to the otherwise non-smooth, non-convex search space.