Dolphin: Runtime Optimization for Distributed Machine Learning

Abstract

Large-scale machine learning (ML) systems are becoming widely used. Typically, these ML systems run on fixed resources, but it is difficult to find their optimal configurations (e.g., how many nodes to use, how to distribute data) since they depend on multiple factors such as hardware environments, ML algorithms, input datasets, etc. Furthermore, optimal… (More)

Topics

1 Figure or Table

Slides referencing similar topics