InFi: End-to-End Learning to Filter Input for Resource-Efficiency in Mobile-Centric Inference
@article{Yuan2022InFiEL, title={InFi: End-to-End Learning to Filter Input for Resource-Efficiency in Mobile-Centric Inference}, author={Mu Yuan and Lan Zhang and Fengxiang He and Xueting Tong and Miao-Hui Song and Xiang Li}, journal={ArXiv}, year={2022}, volume={abs/2209.13873} }
—Mobile-centric AI applications have high requirements for resource-efficiency of model inference. Input filtering is a promising approach to eliminate the redundancy so as to reduce the cost of inference. Previous efforts have tailored effective solutions for many applications, but left two essential questions unanswered: (1) theoretical filterability of an inference workload to guide the application of input filtering techniques, thereby avoiding the trial-and-error cost for resource-constrained…
Figures and Tables from this paper
References
SHOWING 1-10 OF 64 REFERENCES
Limits of End-to-End Learning
- Computer ScienceACML
- 2017
The question whether and to what extent end-to-end learning is a future-proof technique in the sense of scaling to complex and diverse data processing architectures is asked.
Elf: accelerate high-resolution mobile deep vision with content-aware parallel offloading
- Computer ScienceMobiCom
- 2021
The design of Elf is presented, a framework to accelerate the mobile deep vision applications with any server provisioning through the parallel offloading, which employs a recurrent region proposal prediction algorithm, a region proposal centric frame partitioning, and a resource-aware multi-offloading scheme.
MobileNetV2: Inverted Residuals and Linear Bottlenecks
- Computer Science2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition
- 2018
A new mobile architecture, MobileNetV2, is described that improves the state of the art performance of mobile models on multiple tasks and benchmarks as well as across a spectrum of different model sizes and allows decoupling of the input/output domains from the expressiveness of the transformation.
BlazeIt: Optimizing Declarative Aggregation and Limit Queries for Neural Network-Based Video Analytics
- Computer ScienceProc. VLDB Endow.
- 2019
This work introduces BlazeIt, a system that optimizes queries of spatiotemporal information of objects in video, and introduces two new query optimization techniques in BlazeIt that are not supported by prior work.
AMC: AutoML for Model Compression and Acceleration on Mobile Devices
- Computer ScienceECCV
- 2018
This paper proposes AutoML for Model Compression (AMC) which leverages reinforcement learning to efficiently sample the design space and can improve the model compression quality and achieves state-of-the-art model compression results in a fully automated way without any human efforts.
Potluck: Cross-Application Approximate Deduplication for Computation-Intensive Mobile Applications
- Computer ScienceASPLOS
- 2018
This paper presents Potluck, a cache service that stores and shares processing results between applications and a set of algorithms to process the input data to maximize deduplication opportunities, implemented as a background service on Android.
MCDNN: An Approximation-Based Execution Framework for Deep Stream Processing Under Resource Constraints
- Computer ScienceMobiSys
- 2016
This work describes how several common DNNs, when subjected to state-of-the art optimizations, trade off accuracy for resource use such as memory, computation, and energy, and introduces two new and powerful DNN optimizations that exploit it.
A Survey on the Optimization of Neural Network Accelerators for Micro-AI On-Device Inference
- Computer ScienceIEEE Journal on Emerging and Selected Topics in Circuits and Systems
- 2021
This work aims to provide a comprehensive survey about the recent developments in the domain of energy-efficient deployment of DNNs on micro-AI platforms, looking at different neural architecture search strategies as part of micro- AI model design, and providing extensive details about model compression and quantization strategies in practice.
Reducto: On-Camera Filtering for Resource-Efficient Real-Time Video Analytics
- Computer ScienceSIGCOMM
- 2020
Reducto is built, a system that dynamically adapts filtering decisions according to the time-varying correlation between feature type, filtering threshold, query accuracy, and video content, and it achieves significant filtering benefits, while consistently meeting the desired accuracy.
A Hybrid Deep Learning Architecture for Privacy-Preserving Mobile Analytics
- Computer ScienceIEEE Internet of Things Journal
- 2020
This article presents a hybrid approach for breaking down large, complex deep neural networks for cooperative, and privacy-preserving analytics, and shows that by using Siamese fine-tuning and at a small processing cost, this approach can greatly reduce the level of unnecessary, potentially sensitive information in the personal data.