DynO: Dynamic Onloading of Deep Neural Networks from Cloud to Device

  title={DynO: Dynamic Onloading of Deep Neural Networks from Cloud to Device},
  author={M{\'a}rio Almeida and Stefanos Laskaridis and Stylianos I. Venieris and Ilias Leontiadis and Nicholas D. Lane},
  journal={ACM Transactions on Embedded Computing Systems (TECS)},
Recently, there has been an explosive growth of mobile and embedded applications using convolutional neural networks (CNNs). To alleviate their excessive computational demands, developers have traditionally resorted to cloud offloading, inducing high infrastructure costs and a strong dependence on networking conditions. On the other end, the emergence of powerful SoCs is gradually enabling on-device execution. Nonetheless, low- and mid-tier platforms still struggle to run state-of-the-art CNNs… 
Context-Aware Compilation of DNN Training Pipelines across Edge and Cloud
Experimental results show that the proposed pipeline training framework not only significantly speeds up training, but also incurs little accuracy loss or additional memory/energy overhead, delivering a practical and efficient solution to edge-cloud model training.
How to Reach Real-Time AI on Consumer Devices? Solutions for Programmable and Custom Architectures
Preliminary answers to this potentially game-changing question are provided by presenting an array of design techniques for efficient AI systems by examining the major roadblocks when targeting both programmable processors and custom accelerators.
Edge-PRUNE: Flexible Distributed Deep Learning Inference
Compared to previous approaches, Edge-PRUNE is based on a formal dataflow computing model, and is agnostic towards machine learning training frameworks, offering at the same time wide support for leveraging deep learning accelerators such as embedded GPUs.
OODIn: An Optimised On-Device Inference Framework for Heterogeneous Mobile Devices
Radical progress in the field of deep learning (DL) has led to unprecedented accuracy in diverse inference tasks. As such, deploying DL models across mobile platforms is vital to enable the
Smart at what cost?: characterising mobile deep neural networks in the wild
GaugeNN, a tool that automates the deployment, measurement and analysis of DNNs on devices, with support for different frameworks and platforms is developed, showing the gap between bespoke techniques and real-world deployments and the need for optimised deployment of deep learning models in a highly dynamic and heterogeneous ecosystem.
Fault-Tolerant Collaborative Inference through the Edge-PRUNE Framework
The experimental section of this work shows results on achievable inference time savings by collaborative inference, presents fault tolerant system topologies and analyzes their cost in terms of execution time overhead.
Special Session: Towards an Agile Design Methodology for Efficient, Reliable, and Secure ML Systems
The main challenges in agile development of efficient, reliable and secure ML systems are summarized, and an outline of an agile design methodology to generate efficient, reliability and secureML systems based on user-defined constraints and objectives is presented.


SPINN: synergistic progressive inference of neural networks over device and cloud
SPINN is proposed, a distributed inference system that employs synergistic device-cloud computation together with a progressive inference method to deliver fast and robust CNN inference across diverse settings, and provides robust operation under uncertain connectivity conditions and significant energy savings compared to cloud-centric execution.
JALAD: Joint Accuracy-And Latency-Aware Deep Structure Decoupling for Edge-Cloud Execution
JALAD is proposed, a joint accuracy- and latency-aware execution framework, which decouples a deep neural network so that a part of it will run at edge devices and the other part inside the conventional cloud, while only a minimum amount of data has to be transferred between them.
MoDNN: Local distributed mobile computing system for Deep Neural Network
MoDNN is proposed — a local distributed mobile computing system for DNN applications that can partition already trained DNN models onto several mobile devices to accelerate DNN computations by alleviating device-level computing cost and memory usage.
DeepThings: Distributed Adaptive Deep Learning Inference on Resource-Constrained IoT Edge Clusters
DeepThings is proposed, a framework for adaptively distributed execution of CNN-based inference applications on tightly resource-constrained IoT edge clusters that employs a scalable Fused Tile Partitioning of convolutional layers to minimize memory footprint while exposing parallelism.
EmBench: Quantifying Performance Variations of Deep Neural Networks across Modern Commodity Devices
This work attempts to demystify the deep neural network landscape by systematically evaluating a collection of state-of-the-art DNNs on a wide variety of commodity devices and identifies potential bottlenecks in each architecture.
CLIO: enabling automatic compilation of deep learning pipelines across IoT and cloud
Clio presents a novel approach to split machine learning models between an IoT device and cloud in a progressive manner that adapts to wireless dynamics and can be combined with model compression and adaptive model partitioning to create an integrated system for IoT-cloud partitioning.
Dynamic Adaptive DNN Surgery for Inference Acceleration on the Edge
The DNN surgery is designed, which allows partitioned DNN processed at both the edge and cloud while limiting the data transmission, and a Dynamic Adaptive DNN Surgery (DADS) scheme, which optimally partitions the DNN under different network condition.
Edge AI: On-Demand Accelerating Deep Neural Network Inference via Edge Computing
Eggent is a framework that leverages edge computing for DNN collaborative inference through device-edge synergy and generates the best execution plan through the online change point detection algorithm that maps the current bandwidth state to the optimal configuration.
MCDNN: An Approximation-Based Execution Framework for Deep Stream Processing Under Resource Constraints
This work describes how several common DNNs, when subjected to state-of-the art optimizations, trade off accuracy for resource use such as memory, computation, and energy, and introduces two new and powerful DNN optimizations that exploit it.
Elf: accelerate high-resolution mobile deep vision with content-aware parallel offloading
The design of Elf is presented, a framework to accelerate the mobile deep vision applications with any server provisioning through the parallel offloading, which employs a recurrent region proposal prediction algorithm, a region proposal centric frame partitioning, and a resource-aware multi-offloading scheme.