Learning Task-Oriented Communication for Edge Inference: An Information Bottleneck Approach

  title={Learning Task-Oriented Communication for Edge Inference: An Information Bottleneck Approach},
  author={Jiawei Shao and Yuyi Mao and Jun Zhang},
  journal={IEEE Journal on Selected Areas in Communications},
This paper investigates task-oriented communication for edge inference, where a low-end edge device transmits the extracted feature vector of a local data sample to a powerful edge server for processing. It is critical to encode the data into an informative and compact representation for low-latency inference given the limited bandwidth. We propose a learning-based communication scheme that jointly optimizes feature extraction, source coding, and channel coding in a task-oriented manner, i.e… 
Task-Oriented Communication for Multi-Device Cooperative Edge Inference
This paper proposes a learning-based communication scheme that optimizes local feature extraction and distributed feature encoding in a task-oriented manner and leverages an information bottleneck principle to extract the task-relevant feature at each edge device and adopt a distributed information bottleneck framework.
Communication-Computation Efficient Device-Edge Co-Inference via AutoML
By selecting a suitable model split point and a pair of encoder/decoder for the intermediate feature vector, this problem is casted as a sequential decision problem, for which, a novel automated machine learning (AutoML) framework is proposed based on deep reinforcement learning (DRL).
Goal-Oriented Communication for Edge Learning Based On the Information Bottleneck
A goal-oriented communication system, based on the combination of IB and stochastic optimization, where the IB principle is used to design the encoder in order to find an optimal balance between representation complexity and relevance of the encoded data with respect to the goal.
Semantic Communication: An Information Bottleneck View
This work proposes an information- theoretic framework where the semantic context is explicitly introduced into probabilistic models and reveals the huge potential of a semantic communication system design.
Adaptable Semantic Compression and Resource Allocation for Task-Oriented Communications
A deep learning-based task-oriented communication architecture is proposed where the user extracts, compresses and transmits semantics in an end-to-end (E2E) manner and an approach is proposed to compress the semantics according to their importance relevant to the task, namely, adaptable semantic compression (ASC).
Resource-Constrained Edge AI with Early Exit Prediction
This paper designs a low-complexity module, namely the Exit Predictor, to guide some distinctly β€œhard” samples to bypass the computation of the early exits and extends the early exit prediction mechanism for latencyaware edge inference, which adapts the prediction thresholds of the Exit predictor and the confidence thresholds ofThe early-exit network via a few simple regression models.
Graph Neural Networks for Wireless Communications: From Theory to Practice
For design guidelines, this paper proposes a unified framework that is applicable to general design problems in wireless networks, which includes graph modeling, neural architecture design, and theory-guided performance enhancement, and extensive simulations verify the theory and effectiveness of the proposed design framework.
Edge Learning for B5G Networks with Distributed Signal Processing: Semantic Communication, Edge Computing, and Wireless Sensing
An overview on practical distributed EL techniques and their interplay with advanced communication optimization designs is provided, and a first mathematical model of the goal-oriented source entropy as an optimization problem is presented for the application in goal- oriented semantic communication.
Multi-user Co-inference with Batch Processing Capable Edge Server
This work focuses on novel scenarios that the energy-constrained mobile devices offload inference tasks to an edge server with GPU, and it is proven that optimizing the offloading policy of each user independently and aggregating all the same sub-tasks in one batch is optimal, and thus the independent partitioning andSame sub-task aggregating (IP-SSA) algorithm is inspired.
Deep Joint Source-Channel Coding for CSI Feedback: An End-to-End Approach
A deep joint source- channel coding (DJSCC) based framework for the CSI feedback task that can simultaneously learn from the CSI source and the wireless channel and applies non-linear transform networks to compress the CSI.


Communication-Computation Trade-off in Resource-Constrained Edge Inference
This article presents effective methods for edge inference at resource-constrained devices, focusing on device-edge co-inference, assisted by an edge computing server, and investigates a critical trade-off among the computational cost of the on-device model and the communication overhead of forwarding the intermediate feature to the edge server.
On the Information Bottleneck Problems: Models, Connections, Applications and Information Theoretic Views
This tutorial paper focuses on the variants of the bottleneck problem taking an information theoretic perspective and discusses practical methods to solve it, as well as its connection to coding and…
BottleNet++: An End-to-End Approach for Feature Compression in Device-Edge Co-Inference Systems
  • Jiawei Shao, Jun Zhang
  • Computer Science
    2020 IEEE International Conference on Communications Workshops (ICC Workshops)
  • 2020
An end-to-end architecture that consists of an encoder, a non-trainable channel layer, and a decoder for more efficient feature compression and transmission, which achieves a much higher compression ratio than existing methods.
Improving Device-Edge Cooperative Inference of Deep Learning via 2-Step Pruning
This paper proposes an efficient and flexible 2-step pruning framework for DNN partition between mobile devices and edge servers that can greatly reduce either the wireless transmission workload of the device or the total computation workload.
Edge Intelligence: On-Demand Deep Learning Model Co-Inference with Device-Edge Synergy
Eggent is a collaborative and on-demand DNN co-inference framework with device-edge synergy that pursues two design knobs: DNN partitioning that adaptively partitions DNN computation between device and edge, in order to leverage hybrid computation resources in proximity for real-time DNN inference.
Neural Joint Source-Channel Coding
This work proposes to jointly learn the encoding and decoding processes using a new discrete variational autoencoder model and obtains codes that are not only competitive against several separation schemes, but also learn useful robust representations of the data for downstream tasks such as classification.
Graph Neural Networks for Scalable Radio Resource Management: Architecture Design and Theoretical Analysis
This paper demonstrates that radio resource management problems can be formulated as graph optimization problems that enjoy a universal permutation equivariance property, and identifies a family of neural networks, named message passing graph neural networks (MPGNNs), which can generalize to large-scale problems, while enjoying a high computational efficiency.
Information Dropout: Learning Optimal Representations Through Noisy Computation
It is proved that Information Dropout achieves a comparable or better generalization performance than binary dropout, especially on smaller models, since it can automatically adapt the noise to the structure of the network, as well as to the test sample.
Communication-Efficient Edge AI: Algorithms and Systems
A comprehensive survey of the recent developments in various techniques for overcoming key communication challenges in edge AI systems is presented, and communication-efficient techniques are introduced from both algorithmic and system perspectives for training and inference tasks at the network edge.
BlockDrop: Dynamic Inference Paths in Residual Networks
BlockDrop, an approach that learns to dynamically choose which layers of a deep network to execute during inference so as to best reduce total computation without degrading prediction accuracy, is introduced.