Communication-Computation Trade-off in Resource-Constrained Edge Inference

@article{Shao2020CommunicationComputationTI,
  title={Communication-Computation Trade-off in Resource-Constrained Edge Inference},
  author={Jiawei Shao and Jun Zhang},
  journal={IEEE Communications Magazine},
  year={2020},
  volume={58},
  pages={20-26}
}
The recent breakthrough in artificial intelligence (AI), especially deep neural networks (DNNs), has affected every branch of science and technology. Particularly, edge AI has been envisioned as a major application scenario to provide DNN-based services at edge devices. This article presents effective methods for edge inference at resource-constrained devices. It focuses on device-edge co-inference, assisted by an edge computing server, and investigates a critical trade-off among the… 

Figures from this paper

Capacity of Remote Classification Over Wireless Channels
TLDR
Adopting a subspace data model, this work proves the equivalence of classification capacity maximization to the problem of packing on the Grassmann manifold and shows that the classification capacity grows exponentially with the instantaneous communication rate, and super-exponentially with the dimensions of each data cluster.
Edge Artificial Intelligence for 6G: Vision, Enabling Technologies, and Applications
TLDR
The vision for scalable and trustworthy edge AI systems with integrated design of wireless communication strategies and decentralized machine learning models is provided and new design principles of wireless networks, service-driven resource allocation optimization methods, as well as a holistic end-to-end system architecture to support edge AI will be described.
Communication-Computation Efficient Device-Edge Co-Inference via AutoML
TLDR
By selecting a suitable model split point and a pair of encoder/decoder for the intermediate feature vector, this problem is casted as a sequential decision problem, for which, a novel automated machine learning (AutoML) framework is proposed based on deep reinforcement learning (DRL).
Progressive Feature Transmission for Split Inference at the Wireless Edge
TLDR
The progressive feature transmission (ProgressFTX) protocol is proposed, which minimizes the overhead by progressively transmitting features until a target confidence level is reached, and can substantially reduce the communication latency compared to conventional feature pruning and random feature transmission.
Single-Training Collaborative Object Detectors Adaptive to Bandwidth and Computation
TLDR
This work introduces the first configurable solution for object detection that manages the triple communication-computationaccuracy trade-off with a single set of weights, adding only a minor penalty on the base EfficientDet-D2 architecture.
CAROL: Confidence-Aware Resilience Model for Edge Federations
TLDR
A confidence aware resilience model, CAROL, that utilizes a memory-efficient generative neural network to predict the Quality of Service (QoS) for a future state and a confidence score for each prediction, which outperforms state-of-the-art resilience schemes.
Communication-Oriented Model Fine-Tuning for Packet-Loss Resilient Distributed Inference Under Highly Lossy IoT Networks
TLDR
This work has proposed a communication-oriented model tuning (COMtune), which aims to achieve highly accurate DI with low-latency but unreliable communication links, and shows that COMtune enables accurate predictions with low latency and under lossy networks.
Learning Task-Oriented Communication for Edge Inference: An Information Bottleneck Approach
TLDR
This paper proposes a learning-based communication scheme that jointly optimizes feature extraction, source coding, and channel coding in a task-oriented manner, i.e., targeting the downstream inference task rather than data reconstruction.
Optimal Model Placement and Online Model Splitting for Device-Edge Co-Inference
TLDR
This paper obtains a closed-form model placement solution for the fully-connected multilayer perceptron with equal neurons and formulates an optimal stopping problem, where the finite horizon of the problem is determined by the model placement decision.
Resource Allocation for Multiuser Edge Inference with Batching and Early Exiting (Extended Version)
TLDR
Experimental results demonstrate that both optimal and sub-optimal C 2 resource allocation algorithms can leverage integrated batching and early exiting to achieve 200% throughput gain over conventional schemes.
...
1
2
...

References

SHOWING 1-10 OF 20 REFERENCES
Improving Device-Edge Cooperative Inference of Deep Learning via 2-Step Pruning
TLDR
This paper proposes an efficient and flexible 2-step pruning framework for DNN partition between mobile devices and edge servers that can greatly reduce either the wireless transmission workload of the device or the total computation workload.
Communication-Efficient Edge AI: Algorithms and Systems
TLDR
A comprehensive survey of the recent developments in various techniques for overcoming key communication challenges in edge AI systems is presented, and communication-efficient techniques are introduced from both algorithmic and system perspectives for training and inference tasks at the network edge.
BottleNet++: An End-to-End Approach for Feature Compression in Device-Edge Co-Inference Systems
  • Jiawei Shao, Jun Zhang
  • Computer Science
    2020 IEEE International Conference on Communications Workshops (ICC Workshops)
  • 2020
TLDR
An end-to-end architecture that consists of an encoder, a non-trainable channel layer, and a decoder for more efficient feature compression and transmission, which achieves a much higher compression ratio than existing methods.
JALAD: Joint Accuracy-And Latency-Aware Deep Structure Decoupling for Edge-Cloud Execution
TLDR
JALAD is proposed, a joint accuracy- and latency-aware execution framework, which decouples a deep neural network so that a part of it will run at edge devices and the other part inside the conventional cloud, while only a minimum amount of data has to be transferred between them.
Edge Intelligence: Paving the Last Mile of Artificial Intelligence With Edge Computing
TLDR
A comprehensive survey of the recent research efforts on EI is conducted, which provides an overview of the overarching architectures, frameworks, and emerging key technologies for deep learning model toward training/inference at the network edge.
Once for All: Train One Network and Specialize it for Efficient Deployment
TLDR
This work proposes to train a once-for-all (OFA) network that supports diverse architectural settings by decoupling training and search, to reduce the cost and propose a novel progressive shrinking algorithm, a generalized pruning method that reduces the model size across many more dimensions than pruning.
Toward an Intelligent Edge: Wireless Communication Meets Machine Learning
TLDR
A new set of design guidelines for wireless communication in edge learning, collectively called learning- driven communication is advocated, which crosses and revolutionizes two disciplines: wireless communication and machine learning.
Model Compression and Acceleration for Deep Neural Networks: The Principles, Progress, and Challenges
TLDR
It is shown that the top face-verification results from the Labeled Faces in the Wild data set were obtained with networks containing hundreds of millions of parameters, using a mix of convolutional, locally connected, and fully connected layers.
To prune, or not to prune: exploring the efficacy of pruning for model compression
TLDR
Across a broad range of neural network architectures, large-sparse models are found to consistently outperform small-dense models and achieve up to 10x reduction in number of non-zero parameters with minimal loss in accuracy.
Neural Joint Source-Channel Coding
TLDR
This work proposes to jointly learn the encoding and decoding processes using a new discrete variational autoencoder model and obtains codes that are not only competitive against several separation schemes, but also learn useful robust representations of the data for downstream tasks such as classification.
...
1
2
...