• Publications
  • Influence
More Effective Distributed ML via a Stale Synchronous Parallel Parameter Server
TLDR
We propose a parameter server system for distributed ML, which follows a Stale Synchronous Parallel (SSP) model of computation that maximizes the time computational workers spend doing useful work on ML algorithms, while still providing correctness guarantees. Expand
  • 470
  • 81
  • PDF
LightLDA: Big Topic Models on Modest Computer Clusters
TLDR
We propose a new, highly-efficient O(1) Metropolis-Hastings sampling algorithm, whose running cost is (surprisingly) agnostic of model size, and empirically converges nearly an order of magnitude more quickly than current state-of-the-art Gibbs samplers. Expand
  • 148
  • 28
  • PDF
Petuum: A New Platform for Distributed Machine Learning on Big Data
  • E. Xing, Q. Ho, +7 authors Y. Yu
  • Computer Science
  • IEEE Transactions on Big Data
  • 30 December 2013
TLDR
We propose a general-purpose framework, Petuum, that systematically addresses data- and model-parallel challenges in large-scale ML, by observing that many ML programs are fundamentally optimization-centric and admit error-tolerant, iterative-convergent algorithmic solutions. Expand
  • 205
  • 24
  • PDF
Poseidon: An Efficient Communication Architecture for Distributed Deep Learning on GPU Clusters
TLDR
We present Poseidon, an efficient communication architecture for distributed DL on GPUs. Expand
  • 153
  • 24
  • PDF
Exploiting Bounded Staleness to Speed Up Big Data Analytics
TLDR
This paper explores approaches to exploiting these algorithms' convergent nature to improve performance, by allowing parallel and distributed threads to use loose consistency models for shared algorithm state. Expand
  • 121
  • 20
  • PDF
Petuum: A New Platform for Distributed Machine Learning on Big Data
TLDR
We propose a general-purpose framework, Petuum, that systematically addresses data- and model-parallel challenges in large-scale ML, by observing that many ML programs are fundamentally optimization-centric and admit error-tolerant, iterative-convergent algorithmic solutions. Expand
  • 187
  • 18
Managed communication and consistency for fast data-parallel iterative analytics
TLDR
We present Bösen, a system that maximizes the network communication efficiency under a given inter-machine network bandwidth budget to minimize parallel error, while ensuring theoretical convergence guarantees for large-scale data-parallel ML applications. Expand
  • 103
  • 14
  • PDF
Solving the Straggler Problem with Bounded Staleness
TLDR
We propose the Stale Synchronous Parallel (SSP) model as a generalization of BSP that preserves many of its advantages, while avoiding the straggler problem. Expand
  • 86
  • 12
  • PDF
Unified analysis of streaming news
TLDR
We present a unified framework to group incoming news articles into temporary but tightly-focused storylines, to identify prevalent topics and key entities within these stories, and to reveal the temporal structure of stories as they evolve. Expand
  • 85
  • 11
  • PDF
High-Performance Distributed ML at Scale through Parameter Server Consistency Models
TLDR
We study the theoretical guarantees and empirical behavior of iterative-convergent ML algorithms in existing PS consistency models, and propose a new PS system that enables ML algorithms to reach their solution more quickly. Expand
  • 84
  • 8
  • PDF