• Corpus ID: 235446538

HELP: Hardware-Adaptive Efficient Latency Predictor for NAS via Meta-Learning

  title={HELP: Hardware-Adaptive Efficient Latency Predictor for NAS via Meta-Learning},
  author={Hayeon Lee and Sewoong Lee and Song Chong and Sung Ju Hwang},
For deployment, neural architecture search should be hardware-aware, in order to satisfy the device-specific constraints (e.g., memory usage, latency and energy consumption) and enhance the model efficiency. Existing methods on hardware-aware NAS collect a large number of samples (e.g., accuracy and latency) from a target device, either builds a lookup table or a latency estimator. However, such approach is impractical in real-world scenarios as there exist numerous devices with different… 
MAPLE-X: Latency Prediction with Explicit Microprocessor Prior Knowledge
This work proposes MAPLE-X which extends MAPLE by incorporating explicit prior knowledge of hardware devices and DNN architecture latency to better account for model stability and robustness and demonstrates a 5% improvement over MAPLE and 9% over HELP.
MAPLE: Microprocessor A Priori for Latency Estimation
The proposed Microprocessor A Priori for Latency Estimation (MAPLE) provides a versatile and practical latency prediction methodology for DNN run-time inference on multiple hardware devices while not imposing any significant overhead for sample collection.
What to expect of hardware metric predictors in NAS
It is shown that simply verifying the predictions of just the selected architectures can lead to substantially improved results, and under a time budget, it is preferable to use a fast and inaccurate prediction model over accurate but slow live measurements.
A Hardware-Aware Framework for Accelerating Neural Architecture Search Across Modalities
It is shown how evolutionary algorithms can be paired with lightly trained objective predictors in an iterative cycle to accelerate architecture search in a multi-objective setting for various modalities including machine translation and image classification.
PR-DARTS: Pruning-Based Differentiable Architecture Search
This work proposes two novel PrunedConv and PrunedLinear operations that mitigate the problem of unstable gradients by regularizing the objective function of the pruned networks and outperforms those used in the state-of-the-art pruning networks on CIFAR-10 and ImageNet.
COBRA is introduced, that leverages a transformer encoder to learn representations of short code snippets that are aggregated by a Graph Convolutional Network that captures the algo-rithmic dependencies and that estimates the latency of the implemented DNN.
Accelerating neural architecture exploration across modalities using genetic algorithms
It is shown how genetic algorithms can be paired with lightly trained objective predictors in an iterative cycle to accelerate multi-objective architectural exploration in the modalities of both machine translation and image classification.
FedorAS: Federated Architecture Search under system heterogeneity
The FedorAS system is designed to discover and train promising architectures when dealing with devices of varying capabilities holding non-IID distributed data, and shows its better performance compared to state-of-the-art federated solutions, while maintaining resource efficiency.
One Proxy Device Is Enough for Hardware-Aware Neural Architecture Search
The results highlight that, by using just one proxy device, one can find almost the same Pareto-optimal architectures as the existing per-device NAS, while avoiding the prohibitive cost of building a latency predictor for each device.


HW-NAS-Bench: Hardware-Aware Neural Architecture Search Benchmark
HW-NAS-Bench is developed, the first public dataset for HW-NAS research which aims to democratize HW- NAS research to non-hardware experts and make HW-NA research more reproducible and accessible and verify that dedicated device-specific HW- Nas can indeed lead to optimal accuracy-cost trade-offs.
Latency-Aware Differentiable Neural Architecture Search
Equipped with this module, the search method can reduce the latency by 20% meanwhile preserving the accuracy, and enjoys the ability of being transplanted to a wide range of hardware platforms with very few efforts, or being used to optimizing other non-differentiable factors such as power consumption.
Once for All: Train One Network and Specialize it for Efficient Deployment
This work proposes to train a once-for-all (OFA) network that supports diverse architectural settings by decoupling training and search, to reduce the cost and propose a novel progressive shrinking algorithm, a generalized pruning method that reduces the model size across many more dimensions than pruning.
BRP-NAS: Prediction-based NAS using GCNs
BRP-NAS is proposed, an efficient hardware-aware NAS enabled by an accurate performance predictor-based on graph convolutional network (GCN) that outperforms all prior methods on NAS-Bench-101, NAS- Bench-201 and DARTS.
ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware
ProxylessNAS is presented, which can directly learn the architectures for large-scale target tasks and target hardware platforms and apply ProxylessNAS to specialize neural architectures for hardware with direct hardware metrics (e.g. latency) and provide insights for efficient CNN architecture design.
PC-DARTS: Partial Channel Connections for Memory-Efficient Architecture Search
This paper presents a novel approach, namely Partially-Connected DARTS, by sampling a small part of super-net to reduce the redundancy in exploring the network space, thereby performing a more efficient search without comprising the performance.
AOWS: Adaptive and Optimal Network Width Search With Latency Constraints
This work introduces a novel efficient one-shot NAS approach to optimally search for channel numbers, given latency constraints on a specific hardware, and proposes an adaptive channel configuration sampling scheme to gradually specialize the training phase to the target computational constraints.
Efficient Neural Architecture Search via Parameter Sharing
Efficient Neural Architecture Search is a fast and inexpensive approach for automatic model design that establishes a new state-of-the-art among all methods without post-training processing and delivers strong empirical performances using much fewer GPU-hours.
HAT: Hardware-Aware Transformers for Efficient Natural Language Processing
This work designs Hardware-Aware Transformers with neural architecture search, and trains a SuperTransformer that covers all candidates in the design space, and efficiently produces many SubTransformers with weight sharing, and performs an evolutionary search with a hardware latency constraint.
Towards Fast Adaptation of Neural Architectures with Meta Learning
A novel Transferable Neural Architecture Search method based on meta-learning, which learns a meta-architecture that is able to adapt to a new task quickly through a few gradient steps, which makes the transferred architecture suitable for the specific task.