InFi: End-to-End Learning to Filter Input for Resource-Efficiency in Mobile-Centric Inference

  title={InFi: End-to-End Learning to Filter Input for Resource-Efficiency in Mobile-Centric Inference},
  author={Mu Yuan and Lan Zhang and Fengxiang He and Xueting Tong and Miao-Hui Song and Xiang Li},
—Mobile-centric AI applications have high requirements for resource-efficiency of model inference. Input filtering is a promising approach to eliminate the redundancy so as to reduce the cost of inference. Previous efforts have tailored effective solutions for many applications, but left two essential questions unanswered: (1) theoretical filterability of an inference workload to guide the application of input filtering techniques, thereby avoiding the trial-and-error cost for resource-constrained… 



Limits of End-to-End Learning

The question whether and to what extent end-to-end learning is a future-proof technique in the sense of scaling to complex and diverse data processing architectures is asked.

Elf: accelerate high-resolution mobile deep vision with content-aware parallel offloading

The design of Elf is presented, a framework to accelerate the mobile deep vision applications with any server provisioning through the parallel offloading, which employs a recurrent region proposal prediction algorithm, a region proposal centric frame partitioning, and a resource-aware multi-offloading scheme.

MobileNetV2: Inverted Residuals and Linear Bottlenecks

A new mobile architecture, MobileNetV2, is described that improves the state of the art performance of mobile models on multiple tasks and benchmarks as well as across a spectrum of different model sizes and allows decoupling of the input/output domains from the expressiveness of the transformation.

BlazeIt: Optimizing Declarative Aggregation and Limit Queries for Neural Network-Based Video Analytics

This work introduces BlazeIt, a system that optimizes queries of spatiotemporal information of objects in video, and introduces two new query optimization techniques in BlazeIt that are not supported by prior work.

AMC: AutoML for Model Compression and Acceleration on Mobile Devices

This paper proposes AutoML for Model Compression (AMC) which leverages reinforcement learning to efficiently sample the design space and can improve the model compression quality and achieves state-of-the-art model compression results in a fully automated way without any human efforts.

Potluck: Cross-Application Approximate Deduplication for Computation-Intensive Mobile Applications

This paper presents Potluck, a cache service that stores and shares processing results between applications and a set of algorithms to process the input data to maximize deduplication opportunities, implemented as a background service on Android.

MCDNN: An Approximation-Based Execution Framework for Deep Stream Processing Under Resource Constraints

This work describes how several common DNNs, when subjected to state-of-the art optimizations, trade off accuracy for resource use such as memory, computation, and energy, and introduces two new and powerful DNN optimizations that exploit it.

A Survey on the Optimization of Neural Network Accelerators for Micro-AI On-Device Inference

This work aims to provide a comprehensive survey about the recent developments in the domain of energy-efficient deployment of DNNs on micro-AI platforms, looking at different neural architecture search strategies as part of micro- AI model design, and providing extensive details about model compression and quantization strategies in practice.

Reducto: On-Camera Filtering for Resource-Efficient Real-Time Video Analytics

Reducto is built, a system that dynamically adapts filtering decisions according to the time-varying correlation between feature type, filtering threshold, query accuracy, and video content, and it achieves significant filtering benefits, while consistently meeting the desired accuracy.

A Hybrid Deep Learning Architecture for Privacy-Preserving Mobile Analytics

This article presents a hybrid approach for breaking down large, complex deep neural networks for cooperative, and privacy-preserving analytics, and shows that by using Siamese fine-tuning and at a small processing cost, this approach can greatly reduce the level of unnecessary, potentially sensitive information in the personal data.