#### Filter Results:

#### Publication Year

2011

2016

#### Publication Type

#### Co-author

#### Publication Venue

#### Data Set Used

#### Key Phrases

Learn More

- Qirong Ho, James Cipar, +6 authors Eric P. Xing
- NIPS
- 2013

We propose a parameter server system for distributed ML, which follows a Stale Synchronous Parallel (SSP) model of computation that maximizes the time computational workers spend doing useful work on ML algorithms, while still providing correctness guarantees. The parameter server provides an easy-to-use shared interface for read/write access to an ML… (More)

- Dan Li, Henggang Cui, Yan Hu, Yong Xia, Xin Wang
- 2011 19th IEEE International Conference on…
- 2011

Multicast benefits data center group communications in saving network bandwidth and increasing application throughput. However, it is challenging to scale Multicast to support tens of thousands of concurrent group communications due to limited forwarding table memory space in the switches, particularly the low-end ones commonly used in modern data centers.… (More)

- Jinliang Wei, Wei Dai, +6 authors Eric P. Xing
- SoCC
- 2015

At the core of Machine Learning (ML) analytics is often an expert-suggested model, whose parameters are refined by iteratively processing a training dataset until convergence. The completion time (i.e. convergence time) and quality of the learned model not only depends on the rate at which the refinements are generated but also the quality of each… (More)

- Henggang Cui, James Cipar, +9 authors Eric P. Xing
- USENIX Annual Technical Conference
- 2014

Many modern machine learning (ML) algorithms are iterative , converging on a final solution via many iterations over the input data. This paper explores approaches to exploiting these algorithms' convergent nature to improve performance, by allowing parallel and distributed threads to use loose consistency models for shared algorithm state. Specifically, we… (More)

- Henggang Cui, Hao Zhang, Gregory R. Ganger, Phillip B. Gibbons, Eric P. Xing
- EuroSys
- 2016

Large-scale deep learning requires huge computational resources to train a multi-layer neural network. Recent systems propose using 100s to 1000s of machines to train networks with tens of layers and billions of connections. While the computation involved can be done more efficiently on GPUs than on more traditional CPU cores, training such networks on a… (More)

- Henggang Cui, Alexey Tumanov, +8 authors Eric P. Xing
- SoCC
- 2014

Many large-scale machine learning (ML) applications use iterative algorithms to converge on parameter values that make the chosen model fit the input data. Often, this approach results in the same sequence of accesses to parameters repeating each iteration. This paper shows that these repeating patterns can and should be exploited to improve the efficiency… (More)

- Henggang Cui, Danielle Rasooly, Moises R. N. Ribeiro, Leonid Kazovsky
- 2011

Our proposal is to gradually deploy 2x2 optical switches to hypercubes planes in order to decrease about 15% of transit traffic processing for bidirectional physical connections and over 20% forwarding traffic in unidirectional links.

Large-scale deep learning requires huge computational resources to train a multi-layer neural network. Recent systems propose using 100s to 1000s of machines to train networks with tens of layers and billions of connections. While the computation involved can be done more efficiently on GPUs than on more traditional CPU cores, training such networks on a… (More)

- Aaron Harlap, Henggang Cui, +5 authors Eric P. Xing
- SoCC
- 2016

FlexRR provides a scalable, efficient solution to the straggler problem for iterative machine learning (ML). The frequent (e.g., per iteration) barriers used in traditional BSP-based distributed ML implementations cause every transient slowdown of any worker thread to delay all others. FlexRR combines a more flexible synchronization model with dynamic… (More)

Time series analysis is commonly used when monitoring data centers, networks, weather, and even human patients. In most cases, the raw time series data is massive, from millions to billions of data points, and yet interactive analyses require low (e.g., sub-second) latency. Aperture transforms raw time series data, during ingest, into compact summarized… (More)