Distributed Deep Convolutional Neural Networks for the Internet-of-Things

  title={Distributed Deep Convolutional Neural Networks for the Internet-of-Things},
  author={Simone Disabato and Manuel Roveri and Cesare Alippi},
  journal={IEEE Transactions on Computers},
Severe constraints on memory and computation characterizing the Internet-of-Things (IoT) units may prevent the execution of Deep Learning (DL)-based solutions, which typically demand large memory and high processing load. In order to support a real-time execution of the considered DL model at the IoT unit level, DL solutions must be designed having in mind constraints on memory and processing capability exposed by the chosen IoT technology. In this article, we introduce a design methodology… Expand
RL-PDNN: Reinforcement Learning for Privacy-Aware Distributed Neural Networks in IoT Systems
This paper introduces a methodology aiming at distributing the DNN tasks onto the resource-constrained devices of the IoT system, while avoiding to reveal the model to participants, and shapes the approach as a reinforcement learning design adequate for real-time applications and highly dynamic systems, namely RL-PDNN. Expand
Exploring compression and parallelization techniques for distribution of deep neural networks over Edge-Fog continuum - a review
The review uncovers significant issues and possible future directions for endorsing deep models as processing engines for real-time IoT, and introduces a novel approach of parallelization for setting up a distributed systems view of DL for IoT. Expand
DistPrivacy: Privacy-Aware Distributed Deep Neural Networks in IoT surveillance systems
This paper introduces a methodology aiming to secure the sensitive data through re-thinking the distribution strategy, without adding any computation overhead, and introduces an online heuristic that supports heterogeneous IoT devices as well as multiple DNNs and datasets, making the pervasive system a general-purpose platform for privacy-aware and low decision-latency applications. Expand
Distributed CNN Inference on Resource-Constrained UAVs for Surveillance Systems: Design and Optimization
A DNN distribution methodology within UAVs is proposed to enable data classification in resource-constrained devices and avoid extra delays introduced by the server-based solutions due to data communication over air-to-ground links. Expand
Efficient Real-Time Image Recognition Using Collaborative Swarm of UAVs and Convolutional Networks
This work presents a strategy aiming at distributing inference requests to a swarm of resource-constrained UAVs that classifies captured images on-board and finds the minimum decision-making latency, and formulate the model as an optimization problem that minimizes the latency between acquiring images and making the final decisions. Expand
Artificial Intelligence Techniques for Cognitive Sensing in Future IoT: State-of-the-Art, Potentials, and Challenges
A survey of different Artificial Intelligence (AI)-based techniques used over the last decade to provide cognitive sensing solutions for different FIoT applications is provided and some state-of-the-art approaches, potentials, and challenges of AI techniques for the identified solutions are presented. Expand
A Calibrated Multiexit Neural Network for Detecting Urothelial Cancer Cells
A novel deep learning model for cancer detection from urinary cytopathology screening images is proposed, and it is shown that the combination of focal loss, multiple outputs, and temperature scaling provides a model that is significantly more accurate and calibrated than a baseline deep convolutional network. Expand


Memory- and Communication-Aware Model Compression for Distributed Deep Learning Inference on IoT
This work proposes Network of Neural Networks (NoNN), a new distributed IoT learning paradigm that compresses a large pretrained ‘teacher’ deep network into several disjoint and highly-compressed ‘student’ modules, without loss of accuracy. Expand
DeepThings: Distributed Adaptive Deep Learning Inference on Resource-Constrained IoT Edge Clusters
DeepThings is proposed, a framework for adaptively distributed execution of CNN-based inference applications on tightly resource-constrained IoT edge clusters that employs a scalable Fused Tile Partitioning of convolutional layers to minimize memory footprint while exposing parallelism. Expand
Moving Convolutional Neural Networks to Embedded Systems: The AlexNet and VGG-16 Case
This paper introduces a methodology for the design and porting of CNNs to limited in resources embedded systems by employing approximate computing techniques to reduce the computational load and memory occupation of the deep learning architecture by compromising accuracy with memory and computation. Expand
Distributed Deep Neural Networks Over the Cloud, the Edge and End Devices
Compared with the traditional method of offloading raw sensor data to be processed in the cloud, DDNN locally processes most sensor data on end devices while achieving high accuracy and is able to reduce the communication cost by a factor of over 20x. Expand
Reducing the Computation Load of Convolutional Neural Networks through Gate Classification
The core of this novel family of CNNs is the presence of Gate-Classification layers that allow to incrementally process the input image through the CNN layers and take a decision as soon as “enough confidence” about the classification is gained, hence not requiring the processing of the whole CNN when not needed. Expand
Deep Learning for IoT Big Data and Streaming Analytics: A Survey
A thorough overview on using a class of advanced machine learning techniques, namely deep learning (DL), to facilitate the analytics and learning in the IoT domain and discusses why DL is a promising approach to achieve the desired analytics in these types of data and applications. Expand
eSGD: Communication Efficient Distributed Deep Learning on the Edge
This work proposes a new method called edge Stochastic Gradient Descent (eSGD) for scaling up edge training of convolutional neural networks and includes two mechanisms to improve the first order gradient based optimization of stochastic objective functions in edge scenario. Expand
Distributed deep learning on edge-devices: Feasibility via adaptive compression
It is shown that distributed deep learning computation on WAN connected devices feasible, in spite of the traffic caused by learning tasks, and that such a setup rises some important challenges, most notably the ingress traffic that the servers hosting the up-to-date model have to sustain. Expand
YodaNN: An Ultra-Low Power Convolutional Neural Network Accelerator Based on Binary Weights
A HW accelerator optimized for BinaryConnect CNNs that achieves 1510 GOp/s on a core area of only 1.33 MGE and with a power dissipation of 153 mW in UMC 65 nm technology at 1.2 V is presented. Expand
DeepRebirth: Accelerating Deep Neural Network Execution on Mobile Devices
The proposed optimization operations significantly accelerate the model execution and also greatly reduce the run-time memory cost since the slimmed model architecture contains less hidden layers. Expand