• Corpus ID: 29024192

NeuralPower: Predict and Deploy Energy-Efficient Convolutional Neural Networks

@article{Cai2017NeuralPowerPA,
  title={NeuralPower: Predict and Deploy Energy-Efficient Convolutional Neural Networks},
  author={Ermao Cai and Da-Cheng Juan and Dimitrios Stamoulis and Diana Marculescu},
  journal={ArXiv},
  year={2017},
  volume={abs/1710.05420}
}
"How much energy is consumed for an inference made by a convolutional neural network (CNN)?" With the increased popularity of CNNs deployed on the wide-spectrum of platforms (from mobile devices to workstations), the answer to this question has drawn significant attention. From lengthening battery life of mobile devices to reducing the energy bill of a datacenter, it is important to understand the energy efficiency of CNNs during serving for making an inference, before actually training the… 

Hardware-Aware Machine Learning: Modeling and Optimization

A comprehensive assessment of state-of-the-art work and selected results on the hardware-aware modeling and optimization for ML applications is provided and several open questions are highlighted that are poised to give rise to novelHardware-aware designs in the next few years.

Performance Prediction for Convolutional Neural Networks in Edge Devices

Five of the widely used Machine Learning based methods for execution time prediction of CNNs on two (2) edge GPU platforms are presented and Experimental results show that eXtreme Gradient Boosting provides a less than 14.73% average prediction error even for unexplored and unseen CNN models' architectures.

Energy‐based tuning of convolutional neural networks on multi‐GPUs

This work evaluates energy consumption on four different networks based on the two most popular ones, ie, ResNet/AlexNet, and correlates with performance and that Pascal may have up to 40% gains versus Maxwell, and Experimental results on a multi‐GPU server endowed with twin Maxwell and twin Pascal Titan X GPUs demonstrate that energy correlates withperformance.

Blackthorn: Latency Estimation Framework for CNNs on Embedded Nvidia Platforms

This work proposes Blackthorn, a layer-wise latency estimation framework for embedded Nvidia GPUs based on analytical models that provides accurate predictions for each layer, helping developers to find bottlenecks and optimize the architecture of a DNN to fit target platforms.

Energy Predictive Models for Convolutional Neural Networks on Mobile Platforms

This work provides a comprehensive analysis of building regression-based predictive models for deep learning on mobile devices, based on empirical measurements gathered from the SyNERGY framework, and shows that simple layer-type features achieve a model complexity of 4 to 32 times less for convolutional layer predictions for a similar accuracy compared to predictive models using more complex features adopted by previous approaches.

Designing Adaptive Neural Networks for Energy-Constrained Image Classification

This work casts the design of adaptive CNNs as a hyper-parameter optimization problem with respect to energy, accuracy, and communication constraints imposed by the mobile device, and adapt Bayesian optimization to the properties of the design space, reaching near-optimal configurations in few tens of function evaluations.

Improving QoE of Deep Neural Network Inference on Edge Devices: A Bandit Approach

A novel automated and user-centric DNN selection engine, called <inline-formula> <tex-math notation="LaTeX">$\mathsf {Aquaman}$ </tex-Math></inline- formula>, which keeps users into a closed loop and leverages their QoE feedback to guide DNN Selection decisions.

SCANN: Synthesis of Compact and Accurate Neural Networks

A two-step neural network synthesis methodology, called DR+SCANN, that combines two complementary approaches to design compact and accurate DNNs, and combines the SCANN methodology with dataset dimensionality reduction to alleviate the curse of dimensionality.

Efficient Resource-Aware Convolutional Neural Architecture Search for Edge Computing with Pareto-Bayesian Optimization

An efficient resource-aware Pareto Bayesian search method that can automatically search for neural networks featured with high accuracy and can satisfy certain hardware performance requirements and demonstrates that with this method, the inference latency of the searched network structure was reduced without scarifying the accuracy.

perf4sight: A toolflow to model CNN training performance on Edge GPUs

  • A. RajagopalC. Bouganis
  • Computer Science
    2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW)
  • 2021
The increased memory and processing capabilities of today’s edge devices create opportunities for greater edge intelligence. In the domain of vision, the ability to adapt a Convolutional Neural
...

References

SHOWING 1-10 OF 19 REFERENCES

DeLight: Adding Energy Dimension To Deep Neural Networks

This paper uses energy characterization to bound the network size in accordance to the pertinent physical resources and uses an automated customization methodology to adaptively conform the DNN configurations to the underlying hardware characteristics while minimally affecting the inference accuracy.

Paleo: A Performance Model for Deep Neural Networks

This work introduces an analytical performance model called PALEO, which can efficiently and accurately model the expected scalability and performance of a putative deep learning system and is robust to the choice of network architecture, hardware, software, communication schemes, and parallelization strategies.

BinaryConnect: Training Deep Neural Networks with binary weights during propagations

BinaryConnect is introduced, a method which consists in training a DNN with binary weights during the forward and backward propagations, while retaining precision of the stored weights in which gradients are accumulated, and near state-of-the-art results with BinaryConnect are obtained on the permutation-invariant MNIST, CIFAR-10 and SVHN.

Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1

A binary matrix multiplication GPU kernel is written with which it is possible to run the MNIST BNN 7 times faster than with an unoptimized GPU kernel, without suffering any loss in classification accuracy.

Learning both Weights and Connections for Efficient Neural Network

A method to reduce the storage and computation required by neural networks by an order of magnitude without affecting their accuracy by learning only the important connections, and prunes redundant connections using a three-step method.

Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks

This work implements a CNN accelerator on a VC707 FPGA board and compares it to previous approaches, achieving a peak performance of 61.62 GFLOPS under 100MHz working frequency, which outperform previous approaches significantly.

Neural networks designing neural networks: Multi-objective hyper-parameter optimization

This work presents a multi-objective design space exploration method that reduces the number of solution networks trained and evaluated through response surface modelling and is evaluated on the MNIST and CIFAR-10 image datasets, optimizing for both recognition accuracy and computational complexity.

Caffe: Convolutional Architecture for Fast Feature Embedding

Caffe provides multimedia scientists and practitioners with a clean and modifiable framework for state-of-the-art deep learning algorithms and a collection of reference models for training and deploying general-purpose convolutional neural networks and other deep models efficiently on commodity architectures.

Network In Network

With enhanced local modeling via the micro network, the proposed deep network structure NIN is able to utilize global average pooling over feature maps in the classification layer, which is easier to interpret and less prone to overfitting than traditional fully connected layers.

Neural Architecture Search with Reinforcement Learning

This paper uses a recurrent network to generate the model descriptions of neural networks and trains this RNN with reinforcement learning to maximize the expected accuracy of the generated architectures on a validation set.