• Corpus ID: 246035830

GEMEL: Model Merging for Memory-Efficient, Real-Time Video Analytics at the Edge

  title={GEMEL: Model Merging for Memory-Efficient, Real-Time Video Analytics at the Edge},
  author={Arthi Padmanabhan and Neil Agarwal and Anand Iyer and Ganesh Ananthanarayanan and Yuanchao Shu and Nikolaos Karianakis and Guoqing Harry Xu and Ravi Netravali},
Video analytics pipelines have steadily shifted to edge deployments to reduce bandwidth overheads and privacy violations, but in doing so, face an ever-growing resource tension. Most notably, edge-box GPUs lack the memory needed to concurrently house the growing number of (increasingly complex) models for real-time inference. Unfortunately, existing solutions that rely on time/space sharing of GPU resources are insufficient as the required swapping delays result in unacceptable frame drops and… 



PipeSwitch: Fast Pipelined Context Switching for Deep Learning Applications

The key idea is to leverage the layered structure of neural network models and their layer-by-layer computation pattern to pipeline model transmission over the PCIe and task execution in the GPU with model-aware grouping.

Memory Optimization for Deep Networks

MONeT is an automatic framework that minimizes both the memory footprint and computational overhead of deep networks, and is able to outperform all prior hand-tuned operations as well as automated checkpointing.

Reducto: On-Camera Filtering for Resource-Efficient Real-Time Video Analytics

Reducto is built, a system that dynamically adapts filtering decisions according to the time-varying correlation between feature type, filtering threshold, query accuracy, and video content, and it achieves significant filtering benefits, while consistently meeting the desired accuracy.

Nexus: a GPU cluster engine for accelerating DNN-based video analysis

Nexus is a fully implemented system that includes cluster-scale resource management that performs detailed scheduling of GPUs, reasoning about groups of DNN invocations that need to be co-scheduled, and moving from the conventional whole-DNN execution model to executing fragments ofDNNs.

PRETZEL: Opening the Black Box of Machine Learning Prediction Serving Systems

PRETZEL is a prediction serving system introducing a novel white box architecture enabling both end-to-end and multi-model optimizations and is on average able to reduce 99th percentile latency while reducing memory footprint, and increasing throughput.

Serving DNNs like Clockwork: Performance Predictability from the Bottom Up

This work adopts a principled design methodology to successively build a fully distributed model serving system that achieves predictable end-to-end performance and demonstrates that Clockwork exploits predictable execution times to achieve tight request- level service-level objectives (SLOs) as well as a high degree of request-level performance isolation.

MCDNN: An Approximation-Based Execution Framework for Deep Stream Processing Under Resource Constraints

This work describes how several common DNNs, when subjected to state-of-the art optimizations, trade off accuracy for resource use such as memory, computation, and energy, and introduces two new and powerful DNN optimizations that exploit it.

Ekya: Continuous Learning of Video Analytics Models on Edge Compute Servers

This work addresses the challenge of jointly supporting inference and retraining tasks on edge servers, which requires navigating the fundamental tradeoff between the retrained model’s accuracy and the inference accuracy.

Bridging the Edge-Cloud Barrier for Real-time Advanced Vision Analytics

CloudSeg is presented, an edge-to-cloud framework for advanced vision analytics that co-designs the cloud-side inference with real-time video streaming, to achieve both low latency and high inference accuracy.

Mistify: Automating DNN Model Porting for On-Device Inference at the Edge

The challenges of manually generating a large number of compressed models are quantified and a system framework, Mistify, is built to automatically port a cloudbased model to a suite of models for edge devices targeting various points in the design space, reducing the manual effort.