It's always personal: Using Early Exits for Efficient On-Device CNN Personalisation

  title={It's always personal: Using Early Exits for Efficient On-Device CNN Personalisation},
  author={Ilias Leontiadis and Stefanos Laskaridis and Stylianos I. Venieris and Nicholas D. Lane},
  journal={Proceedings of the 22nd International Workshop on Mobile Computing Systems and Applications},
On-device machine learning is becoming a reality thanks to the availability of powerful hardware and model compression techniques. Typically, these models are pretrained on large GPU clusters and have enough parameters to generalise across a wide variety of inputs. In this work, we observe that a much smaller, personalised model can be employed to fit a specific scenario, resulting in both higher accuracy and faster execution. Nevertheless, on-device training is extremely challenging, imposing… Expand

Figures from this paper

Smart at what cost?: characterising mobile deep neural networks in the wild
GaugeNN, a tool that automates the deployment, measurement and analysis of DNNs on devices, with support for different frameworks and platforms is developed, showing the gap between bespoke techniques and real-world deployments and the need for optimised deployment of deep learning models in a highly dynamic and heterogeneous ecosystem. Expand
perf4sight: A toolflow to model CNN training performance on Edge GPUs
  • A. Rajagopal, C. Bouganis
  • Computer Science
  • 2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW)
  • 2021
The increased memory and processing capabilities of today’s edge devices create opportunities for greater edge intelligence. In the domain of vision, the ability to adapt a Convolutional NeuralExpand
How to Reach Real-Time AI on Consumer Devices? Solutions for Programmable and Custom Architectures
Preliminary answers to this potentially game-changing question are provided by presenting an array of design techniques for efficient AI systems by examining the major roadblocks when targeting both programmable processors and custom accelerators. Expand
Adaptive Inference through Early-Exit Networks: Design, Challenges and Directions
This paper decomposes the design methodology of early-exit networks to its key components and surveys the recent advances in each one of them, positioning early-exiting against other efficient inference solutions and providing insights on the current challenges and most promising future directions for research in the field. Expand
Multi-Exit Semantic Segmentation Networks
A framework for converting state-of-the-art segmentation models to MESS networks; specially trained CNNs that employ parametrised early exits along their depth to save computation during inference on easier samples and co-optimise the number, placement and architecture of the attached segmentation heads, along with the exit policy, to adapt to the device capabilities and application-specific requirements. Expand
On-device Federated Learning with Flower
This paper presents an exploration of on-device FL on various smartphones and embedded devices using the Flower framework and evaluates the system costs and discusses how this quantification could be used to design more efficient FL algorithms. Expand


Now that I can see, I can improve: Enabling data-driven finetuning of CNNs on the edge
  • A. Rajagopal, C. Bouganis
  • Computer Science
  • 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)
  • 2020
The results show that on average, data-aware pruning with retraining can provide 10.2pp increased accuracy over a wide range of subsets, networks and pruning levels with a maximum improvement of 42.0pp over pruning and retraining in a manner agnostic to the data being processed by the network. Expand
Communication-Efficient Learning of Deep Networks from Decentralized Data
This work presents a practical method for the federated learning of deep networks based on iterative model averaging, and conducts an extensive empirical evaluation, considering five different model architectures and four datasets. Expand
HAPI: Hardware-Aware Progressive Inference
This work presents HAPI, a novel methodology for generating highperformance early-exit networks by co-optimising the placement of intermediate exits together with the early- exit strategy at inference time, and proposes an efficient design space exploration algorithm which enables the faster traversal of a large number of alternative architectures and generates the highest-performing design. Expand
EmBench: Quantifying Performance Variations of Deep Neural Networks across Modern Commodity Devices
This work attempts to demystify the deep neural network landscape by systematically evaluating a collection of state-of-the-art DNNs on a wide variety of commodity devices and identifies potential bottlenecks in each architecture. Expand
AI Benchmark: All About Deep Learning on Smartphones in 2019
This paper evaluates the performance and compares the results of all chipsets from Qualcomm, HiSilicon, Samsung, MediaTek and Unisoc that are providing hardware acceleration for AI inference and discusses the recent changes in the Android ML pipeline. Expand
MobileNetV2: Inverted Residuals and Linear Bottlenecks
A new mobile architecture, MobileNetV2, is described that improves the state of the art performance of mobile models on multiple tasks and benchmarks as well as across a spectrum of different model sizes and allows decoupling of the input/output domains from the expressiveness of the transformation. Expand
Multi-Scale Dense Networks for Resource Efficient Image Classification
Experiments demonstrate that the proposed framework substantially improves the existing state-of-the-art in both image classification with computational resource limits at test time and budgeted batch classification. Expand
SCAN: A Scalable Neural Networks Framework Towards Compact and Efficient Models
The so-called SCAN framework for networks training and inference is proposed, which is orthogonal and complementary to existing acceleration and compression methods and proposes a threshold controlled scalable inference mechanism to approach human-like sample-specific inference. Expand
SPINN: synergistic progressive inference of neural networks over device and cloud
SPINN is proposed, a distributed inference system that employs synergistic device-cloud computation together with a progressive inference method to deliver fast and robust CNN inference across diverse settings, and provides robust operation under uncertain connectivity conditions and significant energy savings compared to cloud-centric execution. Expand
Distilling the Knowledge in a Neural Network
This work shows that it can significantly improve the acoustic model of a heavily used commercial system by distilling the knowledge in an ensemble of models into a single model and introduces a new type of ensemble composed of one or more full models and many specialist models which learn to distinguish fine-grained classes that the full models confuse. Expand