• Corpus ID: 229642122

Apollo: Transferable Architecture Exploration

  title={Apollo: Transferable Architecture Exploration},
  author={Amir Yazdanbakhsh and Christof Angermueller and Berkin Akin and Yanqi Zhou and Albin Jones and Milad Hashemi and Kevin Swersky and Satrajit Chatterjee and Ravi Narayanaswami and James Laudon},
The looming end of Moore’s Law and ascending use of deep learning drives the design of custom accelerators that are optimized for specific neural architectures. Architecture exploration for such accelerators forms a challenging constrained optimization problem over a complex, high-dimensional, and structured input space with a costly to evaluate objective function. Existing approaches for accelerator design are sample-inefficient and do not transfer knowledge between related optimizations tasks… 

Figures and Tables from this paper

A full-stack search technique for domain optimized deep learning accelerators

This paper analyzes bottlenecks in state-of-the-art vision and natural language processing (NLP) models, including EfficientNet and BERT, and uses FAST to design accelerators capable of addressing these bottlenECks, and shows that FAST-generated accelerators can potentially be practical for moderate-sized datacenter deployments.

Data-Driven Offline Optimization For Architecting Hardware Accelerators

This paper develops a data-driven offline optimization method for designing hardware accelerators, dubbed PRIME, that learns a conservative, robust estimate of the desired cost function, utilizes infeasible points and optimizes the design against this estimate without any additional simulator queries during optimization.

AIRCHITECT: Learning Custom Architecture Design and Mapping Space

This paper designs and trains a custom network architecture called AIRCHITECT, which is capable of learning the architecture design space with as high as 94.3% test accuracy and predicting optimal configurations which achieve on average (GeoMean) of 99.9% the best possible performance on a test dataset with 105 GEMM workloads.

ACDSE: A Design Space Exploration Method for CNN Accelerator based on Adaptive Compression Mechanism

A novel design space exploration method named ACDSE is provided for optimizing the design process of CNN accelerators, which implements the adaptive compression mechanism to dynamically adjust the search range and prune low-value design points according to the exploration states.

Learning A Continuous and Reconstructible Latent Space for Hardware Accelerator Design

This work designs a variational autoencoder (VAE)-based design space exploration framework called VAESA, to encode the hardware design space in a compact and continuous representation, and shows that black-box and gradient-based designspace exploration algorithms can be applied to the latent space, and design points optimized in the latentspace can be reconstructed to high-performance realistic hardware designs.

A Learning-based Approach Towards Automated Tuning of SSD Configurations

An automated learning-based framework that utilizes both supervised and unsupervised machine learning techniques to drive the tuning of hardware configurations for SSDs, LearnedSSD accelerates the development of new SSD devices by automating the hardware parameter configurations and reducing the manual efforts.

uSystolic: Byte-Crawling Unary Systolic Array

  • Di WuJoshua San Miguel
  • Computer Science
    2022 IEEE International Symposium on High-Performance Computer Architecture (HPCA)
  • 2022
This work designs a hybrid unary-binary systolic array, uSystolic, to inherit the legacy-binary data scheduling with slow (thus power-efficient) data movement, i.e., data bytes are crawling out from memory to drive u Systolic.

AutoPilot: Automating SoC Design Space Exploration for SWaP Constrained Autonomous UAVs

The need for holistic full-UAV co-design to achieve maximum overall UAV performance and the need for automated flows to simplify the design process for autonomous cyber-physical systems are demonstrated.

Encoding categorical variables in physics-informed graphs for Bayesian Optimization

This work presents a method for reshaping and simplifying the graph-structures based on prior physical knowledge based on physics-informed graphs, which improves the optimization performance in comparison to the default Combo approach and other state of the art optimization techniques.

Open Source Vizier: Distributed Infrastructure and API for Reliable and Flexible Blackbox Optimization

Open Source (OSS) Vizier is introduced, a standalone Python-based interface for blackbox optimization and research, based on the Google-internal Vizier infrastructure and framework, designed to be a distributed system that assures reliability, and allows multiple parallel evaluations of the user’s objective function.



Automated Accelerator Generation and Optimization with Composable, Parallel and Pipeline Architecture

The composable, parallel and pipeline (CPP) microarchitecture is proposed as an accelerator design template to substantially reduce the design space and the AutoAccel framework is developed to automate the entire accelerator generation process.

Spatial: a language and compiler for application accelerators

This work describes a new domain-specific language and compiler called Spatial for higher level descriptions of application accelerators, and summarizes the compiler passes required to support these abstractions, including pipeline scheduling, automatic memory banking, and automated design tuning driven by active machine learning.

Bayesian Multi-objective Hyperparameter Optimization for Accurate, Fast, and Efficient Neural Network Accelerator Design

This work presents a hierarchical pseudo agent-based multi-objective Bayesian hyperparameter optimization framework that not only maximizes the performance of the network, but also minimizes the energy and area requirements of the corresponding neuromorphic hardware.

Accelerator-aware Neural Network Design using AutoML

A class of computer vision models designed using hardware-aware neural architecture search and customized to run on the Edge TPU, Google's neural network hardware accelerator for low-power, edge devices, that enable real-time image classification performance while achieving accuracy typically seen only with larger, compute-heavy models running in data centers.

OpenTuner: An extensible framework for program autotuning

The efficacy and generality of OpenTuner are demonstrated by building autotuners for 7 distinct projects and 16 total benchmarks, showing speedups over prior techniques of these projects of up to 2.8χ with little programmer effort.

Meta-Learning Acquisition Functions for Transfer Learning in Bayesian Optimization

This work proposes a novel transfer learning method to obtain customized optimizers within the well-established framework of Bayesian optimization, allowing the algorithm to utilize the proven generalization capabilities of Gaussian processes.

FlexiBO: Cost-Aware Multi-Objective Optimization of Deep Neural Networks

FlexiBO, a flexible Bayesian optimization method, is proposed that, when compared to other state-of-the-art methods across the 7 architectures the authors tested, the Pareto front obtained using FlexiBO has, on average, a 28.44% higher contribution to the true Pare to front and achieves 25.64% better diversity.

Model-based reinforcement learning for biological sequence design

A model-based variant of PPO, DyNA-PPO, is proposed to improve sample efficiency and performs significantly better than existing methods in settings in which modeling is feasible, while still not performing worse in situations in which a reliable model cannot be learned.

Population-Based Black-Box Optimization for Biological Sequence Design

It is shown that P3BO outperforms any single method in its population, proposing higher quality sequences as well as more diverse batches and Adaptive-P3BO are a crucial step towards deploying ML to real-world sequence design.