AI Benchmark: All About Deep Learning on Smartphones in 2019

@article{Ignatov2019AIBA,
  title={AI Benchmark: All About Deep Learning on Smartphones in 2019},
  author={Andrey D. Ignatov and Radu Timofte and Andrei Kulik and Seungsoo Yang and Ke Wang and Felix Baum and Max Wu and Lirong Xu and Luc Van Gool},
  journal={2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW)},
  year={2019},
  pages={3617-3635}
}
The performance of mobile AI accelerators has been evolving rapidly in the past two years, nearly doubling with each new generation of SoCs. The current 4th generation of mobile NPUs is already approaching the results of CUDA-compatible Nvidia graphics cards presented not long ago, which together with the increased capabilities of mobile deep learning frameworks makes it possible to run complex and deep AI models on mobile devices. In this paper, we evaluate the performance and compare the… 

DEEP LEARNING FRAMEWORKS EVALUATION

The results show that low-level software optimizations, image pre-processing algorithms, conversion process and cooling design have an impact on latency, accuracy and energy efficiency.

Low-level Optimizations for Faster Mobile Deep Learning Inference Frameworks

This PhD research aims at providing a path for developers to help them choose the best methods and tools to do real-time inference on mobile devices and demonstrates that low-level software implementations chosen in frameworks, model conversion steps and parameters set in the framework have a big impact on performance and accuracy.

How to Reach Real-Time AI on Consumer Devices? Solutions for Programmable and Custom Architectures

Preliminary answers to this potentially game-changing question are provided by presenting an array of design techniques for efficient AI systems by examining the major roadblocks when targeting both programmable processors and custom accelerators.

A Comprehensive Benchmark of Deep Learning Libraries on Mobile Devices

It is found that the best-performing DL lib is severely fragmented across different models and hardware, and the gap between those DL libs can be rather huge.

Real-Time Quantized Image Super-Resolution on Mobile NPUs, Mobile AI 2021 Challenge: Report

The first Mobile AI challenge is introduced, where the target is to develop an end-to-end deep learning-based image super-resolution solutions that can demonstrate a real-time performance on mobile or edge NPUs.

Learned Smartphone ISP on Mobile GPUs with Deep Learning, Mobile AI & AIM 2022 Challenge: Report

The target was to develop an end-to-end AI-based image signal processing (ISP) pipeline replacing the standard mobile ISPs that can run on modern smartphone GPUs using TensorFlow Lite.

Real-Time Video Super-Resolution on Smartphones with Deep Learning, Mobile AI 2021 Challenge: Report

The target is to develop an end-to-end deep learning-based video super-resolution solutions that can achieve a realtime performance on mobile GPUs and can upscale videos to HD resolution at up to 80 FPS while demonstrating high fidelity results.

Fast Camera Image Denoising on Mobile GPUs with Deep Learning, Mobile AI 2021 Challenge: Report

The first Mobile AI challenge is introduced, where the target is to develop an end-to-end deep learning-based image denoising solution that can demonstrate high efficiency on smartphone GPUs.

Comparison and Benchmarking of AI Models and Frameworks on Mobile Devices

A benchmark suite, AIoTBench, which focuses on the evaluation of the inference abilities of mobile and embedded devices, and proposes two unified metrics as the AI scores: Valid Images Per Second (VIPS) and Valid FLOPs Per second (VOPS).

CitiusSynapse: A Deep Learning Framework for Embedded Systems

A deep learning framework that is specialized for embedded systems with limited resources, the operation processing structure of which differs from that of standard PCs is proposed.
...

References

SHOWING 1-10 OF 76 REFERENCES

AI Benchmark: Running Deep Neural Networks on Android Smartphones

A study of the current state of deep learning in the Android ecosystem and describe available frameworks, programming models and the limitations of running AI on smartphones, as well as an overview of the hardware acceleration resources available on four main mobile chipset platforms.

On-Device Neural Net Inference with Mobile GPUs

This paper presents how the mobile GPU is leverage, a ubiquitous hardware accelerator on virtually every phone, to run inference of deep neural networks in real-time for both Android and iOS devices and discusses how to design networks that are mobile GPU-friendly.

7.1 An 11.5TOPS/W 1024-MAC Butterfly Structure Dual-Core Sparsity-Aware Neural Processing Unit in 8nm Flagship Mobile SoC

For mobile systems-on-a-chip (SoCs), energy-efficient neural processing units (NPU) have been studied for performing the convolutional layers (CLs) and fully-connected layers (FCLs) in deep neural networks.

Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference

A quantization scheme is proposed that allows inference to be carried out using integer- only arithmetic, which can be implemented more efficiently than floating point inference on commonly available integer-only hardware.

Rethinking the Inception Architecture for Computer Vision

This work is exploring ways to scale up networks in ways that aim at utilizing the added computation as efficiently as possible by suitably factorized convolutions and aggressive regularization.

cuDNN: Efficient Primitives for Deep Learning

A library similar in intent to BLAS, with optimized routines for deep learning workloads, that contains routines for GPUs, and similarly to the BLAS library, could be implemented for other platforms.

Quantizing deep convolutional networks for efficient inference: A whitepaper

An overview of techniques for quantizing convolutional neural networks for inference with integer weights and activations is presented and it is recommended that per-channel quantization of weights and per-layer quantized of activations be the preferred quantization scheme for hardware acceleration and kernel optimization.

Speed/Accuracy Trade-Offs for Modern Convolutional Object Detectors

A unified implementation of the Faster R-CNN, R-FCN and SSD systems is presented and the speed/accuracy trade-off curve created by using alternative feature extractors and varying other critical parameters such as image size within each of these meta-architectures is traced out.

PIRM Challenge on Perceptual Image Enhancement on Smartphones: Report

This paper reviews the first challenge on efficient perceptual image enhancement with the focus on deploying deep learning models on smartphones and proposes solutions that significantly improved baseline results defining the state-of-the-art for image enhancement on smartphones.

Flexible, High Performance Convolutional Neural Networks for Image Classification

We present a fast, fully parameterizable GPU implementation of Convolutional Neural Network variants. Our feature extractors are neither carefully designed nor pre-wired, but rather learned in a
...