#### Filter Results:

- Full text PDF available (13)

#### Publication Year

2010

2017

- This year (1)
- Last 5 years (12)
- Last 10 years (13)

#### Publication Type

#### Co-author

#### Journals and Conferences

#### Data Set Used

#### Key Phrases

Learn More

- John F. Canny, Huasha Zhao
- KDD
- 2013

This paper describes the BID Data Suite, a collection of hardware, software and design patterns that enable fast, large-scale data mining at very low cost. By co-designing all of these elements we achieve single-machine performance levels that equal or exceed reported <i>cluster</i> implementations for common benchmark problems. A key design criterion is… (More)

A Vehicular Sensor Network (VSN) may be used for urban environment surveillance utilizing vehicle-based sensors to provide an affordable yet good coverage for the urban area. The sensors in VSN enjoy the vehicle’s steady power supply and strong computational capacity not available in traditional Wireless Sensor Network (WSN). However, the mobility of the… (More)

- John F. Canny, Huasha Zhao, Bobby Jaros, Ye Chen, Jiangchang Mao
- 2015 IEEE International Conference on Big Data…
- 2015

Many systems have been developed for machine learning at scale. Performance has steadily improved, but there has been relatively little work on explicitly defining or approaching the limits of performance. In this paper we describe the application of roofline design, an approach borrowed from computer architecture, to large-scale machine learning. In… (More)

- Huasha Zhao, Biye Jiang, John F. Canny, Bobby Jaros
- KDD
- 2015

Gibbs sampling is a workhorse for Bayesian inference but has several limitations when used for parameter estimation, and is often much slower than non-sampling inference methods. SAME (State Augmentation for Marginal Estimation) [15, 8] is an approach to MAP parameter estimation which gives improved parameter estimates over direct Gibbs sampling. SAME can… (More)

- John F. Canny, Huasha Zhao
- SDM
- 2013

Incremental model-update strategies are widely used in machine learning and data mining. By “incremental update” we refer to models that are updated many times using small subsets of the training data. Two wellknown examples are stochastic gradient and MCMC. Both provide fast sequential performance and have generated many of the best-performing methods for… (More)

- Huasha Zhao
- 2014

High Performance Machine Learning through Codesign and Rooflining

- Huasha Zhao, John F. Canny
- 2014 43rd International Conference on Parallel…
- 2014

Allreduce is a basic building block for parallel computing. Our target here is "Big Data" processing on commodity clusters (mostly sparse power-law data). Allreduce can be used to synchronize models, to maintain distributed datasets, and to perform operations on distributed data such as sparse matrix multiply. We first review a key constraint on cluster… (More)

- Huasha Zhao, John F. Canny
- 2012

Stochastic gradient descent is a widely used method to find locally-optimal models in machine learning and data mining. However, it is naturally a sequential algorithm, and parallelization involves severe compromises because the cost of synchronizing across a cluster is much larger than the time required to compute an optimal-sized gradient step. Here we… (More)

- Huasha Zhao, John F. Canny
- ArXiv
- 2013

Many large datasets exhibit power-law statistics: The web graph, social networks, text data, clickthrough data etc. Their adjacency graphs are termed natural graphs, and are known to be difficult to partition. As a consequence most distributed algorithms on these graphs are communicationintensive. Many algorithms on natural graphs involve an Allreduce: a… (More)

- Huasha Zhao, Ye Chen, John F. Canny, Tak W. Yan
- CIKM
- 2014

Search advertising shows trends of vertical extension. Vertical ads typically offer better Return of Investment (ROI) to advertisers as a result of better user engagement. However, campaign and bids in vertical ads are not set at the keyword level. As a result, the matching between user query and ads suffers low recall rate and the match quality is heavily… (More)