AI Benchmark: All About Deep Learning on Smartphones in 2019

@article{Ignatov2019AIBA,
  title={AI Benchmark: All About Deep Learning on Smartphones in 2019},
  author={Andrey D. Ignatov and Radu Timofte and Andrei Kulik and Seungsoo Yang and Ke Wang and Felix Baum and Max Wu and Lirong Xu and Luc Van Gool},
  journal={2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW)},
  year={2019},
  pages={3617-3635}
}
The performance of mobile AI accelerators has been evolving rapidly in the past two years, nearly doubling with each new generation of SoCs. The current 4th generation of mobile NPUs is already approaching the results of CUDA-compatible Nvidia graphics cards presented not long ago, which together with the increased capabilities of mobile deep learning frameworks makes it possible to run complex and deep AI models on mobile devices. In this paper, we evaluate the performance and compare the… Expand
Low-level Optimizations for Faster Mobile Deep Learning Inference Frameworks
TLDR
This PhD research aims at providing a path for developers to help them choose the best methods and tools to do real-time inference on mobile devices and demonstrates that low-level software implementations chosen in frameworks, model conversion steps and parameters set in the framework have a big impact on performance and accuracy. Expand
How to Reach Real-Time AI on Consumer Devices? Solutions for Programmable and Custom Architectures
TLDR
Preliminary answers to this potentially game-changing question are provided by presenting an array of design techniques for efficient AI systems by examining the major roadblocks when targeting both programmable processors and custom accelerators. Expand
Real-Time Quantized Image Super-Resolution on Mobile NPUs, Mobile AI 2021 Challenge: Report
  • Andrey D. Ignatov, R. Timofte, +20 authors Shengpeng Wang
  • Engineering, Computer Science
  • 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)
  • 2021
TLDR
The first Mobile AI challenge is introduced, where the target is to develop an end-to-end deep learning-based image super-resolution solutions that can demonstrate a real-time performance on mobile or edge NPUs. Expand
Real-Time Video Super-Resolution on Smartphones with Deep Learning, Mobile AI 2021 Challenge: Report
TLDR
The target is to develop an end-to-end deep learning-based video super-resolution solutions that can achieve a realtime performance on mobile GPUs and can upscale videos to HD resolution at up to 80 FPS while demonstrating high fidelity results. Expand
Fast Camera Image Denoising on Mobile GPUs with Deep Learning, Mobile AI 2021 Challenge: Report
  • Andrey D. Ignatov, Kim Byeoung-su, +29 authors Feifei Chen
  • Computer Science, Engineering
  • 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)
  • 2021
TLDR
The first Mobile AI challenge is introduced, where the target is to develop an end-to-end deep learning-based image denoising solution that can demonstrate high efficiency on smartphone GPUs. Expand
Comparison and Benchmarking of AI Models and Frameworks on Mobile Devices
TLDR
A benchmark suite, AIoTBench, which focuses on the evaluation of the inference abilities of mobile and embedded devices, and proposes two unified metrics as the AI scores: Valid Images Per Second (VIPS) and Valid FLOPs Per second (VOPS). Expand
Learned Smartphone ISP on Mobile NPUs with Deep Learning, Mobile AI 2021 Challenge: Report
TLDR
The target was to develop an end-to-end deep learning-based image signal processing (ISP) pipeline that can replace classical hand-crafted ISPs and achieve nearly real-time performance on smartphone NPUs. Expand
AI Tax in Mobile SoCs: End-to-end Performance Analysis of Machine Learning in Smartphones
TLDR
This work characterize the execution pipeline of open source ML benchmarks and Android applications in terms of AI tax, the time spent on non-model execution tasks, and discusses where performance bottlenecks may unexpectedly arise. Expand
DynO: Dynamic Onloading of Deep Neural Networks from Cloud to Device
TLDR
DynO is presented, a distributed inference framework that combines the best of both worlds to address several challenges, such as device heterogeneity, varying bandwidth and multi-objective requirements, and outperforms the current state-of-the-art. Expand
AI Tax: The Hidden Cost of AI Data Center Applications
TLDR
It is shown that a purpose-built edge data center can be designed for the stresses of accelerated AI at 15% lower TCO than one derived from homogeneous servers and infrastructure. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 82 REFERENCES
AI Benchmark: Running Deep Neural Networks on Android Smartphones
TLDR
A study of the current state of deep learning in the Android ecosystem and describe available frameworks, programming models and the limitations of running AI on smartphones, as well as an overview of the hardware acceleration resources available on four main mobile chipset platforms. Expand
On-Device Neural Net Inference with Mobile GPUs
TLDR
This paper presents how the mobile GPU is leverage, a ubiquitous hardware accelerator on virtually every phone, to run inference of deep neural networks in real-time for both Android and iOS devices and discusses how to design networks that are mobile GPU-friendly. Expand
7.1 An 11.5TOPS/W 1024-MAC Butterfly Structure Dual-Core Sparsity-Aware Neural Processing Unit in 8nm Flagship Mobile SoC
TLDR
For mobile systems-on-a-chip (SoCs), energy-efficient neural processing units (NPU) have been studied for performing the convolutional layers (CLs) and fully-connected layers (FCLs) in deep neural networks. Expand
Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference
TLDR
A quantization scheme is proposed that allows inference to be carried out using integer- only arithmetic, which can be implemented more efficiently than floating point inference on commonly available integer-only hardware. Expand
Rethinking the Inception Architecture for Computer Vision
TLDR
This work is exploring ways to scale up networks in ways that aim at utilizing the added computation as efficiently as possible by suitably factorized convolutions and aggressive regularization. Expand
cuDNN: Efficient Primitives for Deep Learning
TLDR
A library similar in intent to BLAS, with optimized routines for deep learning workloads, that contains routines for GPUs, and similarly to the BLAS library, could be implemented for other platforms. Expand
Quantizing deep convolutional networks for efficient inference: A whitepaper
TLDR
An overview of techniques for quantizing convolutional neural networks for inference with integer weights and activations is presented and it is recommended that per-channel quantization of weights and per-layer quantized of activations be the preferred quantization scheme for hardware acceleration and kernel optimization. Expand
Speed/Accuracy Trade-Offs for Modern Convolutional Object Detectors
TLDR
A unified implementation of the Faster R-CNN, R-FCN and SSD systems is presented and the speed/accuracy trade-off curve created by using alternative feature extractors and varying other critical parameters such as image size within each of these meta-architectures is traced out. Expand
PIRM Challenge on Perceptual Image Enhancement on Smartphones: Report
TLDR
This paper reviews the first challenge on efficient perceptual image enhancement with the focus on deploying deep learning models on smartphones and proposes solutions that significantly improved baseline results defining the state-of-the-art for image enhancement on smartphones. Expand
Flexible, High Performance Convolutional Neural Networks for Image Classification
We present a fast, fully parameterizable GPU implementation of Convolutional Neural Network variants. Our feature extractors are neither carefully designed nor pre-wired, but rather learned in aExpand
...
1
2
3
4
5
...