Petuum: A New Platform for Distributed Machine Learning on Big Data
@article{Xing2015PetuumAN, title={Petuum: A New Platform for Distributed Machine Learning on Big Data}, author={E. Xing and Q. Ho and Wei Dai and J. Kim and Jinliang Wei and S. Lee and X. Zheng and Pengtao Xie and Abhimanu Kumar and Y. Yu}, journal={IEEE Trans. Big Data}, year={2015}, volume={1}, pages={49-67} }
What is a systematic way to efficiently apply a wide spectrum of advanced ML programs to industrial scale problems, using Big Models (up to 100 s of billions of parameters) on Big Data (up to terabytes or petabytes)? Modern parallelization strategies employ fine-grained operations and scheduling beyond the classic bulk-synchronous processing paradigm popularized by MapReduce, or even specialized graph-based execution that relies on graph representations of ML programs. The variety of approaches… CONTINUE READING
Topics from this paper
187 Citations
Petuum: A New Platform for Distributed Machine Learning on Big Data
- Computer Science
- IEEE Transactions on Big Data
- 2015
- 205
- PDF
KunPeng: Parameter Server based Distributed Learning Systems and Its Applications in Alibaba and Ant Financial
- Computer Science
- KDD
- 2017
- 28
- PDF
Strategies and Principles of Distributed Machine Learning on Big Data
- Computer Science, Mathematics
- ArXiv
- 2015
- 67
- PDF
BLAS-on-flash: An Efficient Alternative for Large Scale ML Training and Inference?
- Computer Science
- NSDI
- 2019
- 1
- PDF
Benchmarking Harp-DAAL: High Performance Hadoop on KNL Clusters
- Computer Science
- 2017 IEEE 10th International Conference on Cloud Computing (CLOUD)
- 2017
- 18
- PDF
MiMatrix: A Massively Distributed Deep Learning Framework on a Petascale High-density Heterogeneous Cluster
- Computer Science
- ArXiv
- 2018
Parallel Processing Systems for Big Data: A Survey
- Computer Science
- Proceedings of the IEEE
- 2016
- 59
- Highly Influenced
- PDF
References
SHOWING 1-10 OF 37 REFERENCES
Distributed GraphLab: A Framework for Machine Learning in the Cloud
- Computer Science
- Proc. VLDB Endow.
- 2012
- 572
- Highly Influential
- PDF
High-Performance Distributed ML at Scale through Parameter Server Consistency Models
- Computer Science, Mathematics
- AAAI
- 2015
- 84
- PDF
Fugue: Slow-Worker-Agnostic Distributed Learning for Big Models on Big Data
- Computer Science
- AISTATS
- 2014
- 18
- PDF
Scaling Distributed Machine Learning with the Parameter Server
- Computer Science
- BigDataScience '14
- 2014
- 1,048
- PDF