Swayam: Distributed Autoscaling to Meet SLAs of Machine Learning Inference Services with Resource Eiciency

Abstract

Developers use Machine Learning (ML) platforms to train ML models and then deploy these ML models as web services for inference (prediction). A key challenge for platform providers is to guarantee response-time Service Level Agreements (SLAs) for inference workloads while maximizing resource e�ciency. Swayam is a fully distributed autoscaling framework that… (More)

Topics

9 Figures and Tables

Slides referencing similar topics