Ganesh Ananthanarayanan

Learn More
Experience from an operational map-reduce cluster reveals that outliers significantly prolong job completion. The causes for outliers include (i) machine characteristics-both hardware reliability (e.g., disk failures) as well as run-time contention for processor, memory and other resources, (ii) network characteristics with varying bandwidths and congestion(More)
▪ MAnalytics jobs are parallel and process large amounts of data ▪ Machines have tens of gigabytes of memory ▪ Falling memory prices ▪ Median utilization of 19% ▪ Heavy-tailed Input Sizes ▪ Elephant and mice jobs ▪ 92% of smallest job inputs can fit in memory MOTIVATION ALL-OR-NOTHING
Active research is being conducted in reducing power consumption of all the components of the Internet. To that end, we propose schemes for power reduction in network switches − Time Window Prediction, Power Save Mode and Lightweight Alternative. These schemes are adaptive to changing traffic patterns and automatically tune their parameters to guarantee a(More)
Small jobs, that are typically run for interactive data analyses in datacenters, continue to be plagued by disproportionately long-running tasks called stragglers. In the production clusters at Facebook and Microsoft Bing, even after applying state-of-the-art straggler mitigation techniques , these latency sensitive jobs have stragglers that are on average(More)
Tasks in modern data parallel clusters have highly diverse resource requirements, along CPU, memory, disk and network. Any of these resources may become bottlenecks and hence, the likelihood of wasting resources due to fragmentation is now larger. Today's schedulers do not explicitly reduce fragmentation. Worse, since they only allocate cores and memory,(More)
To improve data availability and resilience MapReduce frameworks use file systems that replicate data <i>uniformly</i>. However, analysis of job logs from a large production cluster shows wide disparity in data popularity. Machines and racks storing popular content become bottlenecks; thereby increasing the completion times of jobs accessing this data even(More)
Mobile devices are increasingly equipped with multiple network interfaces with complementary characteristics. In particular, the Wi-Fi interface has high throughput and transfer power efficiency, but its idle power consumption is prohibitive. In this paper we present, <i> Blue-Fi</i>, a sytem that predicts the availability of the Wi-Fi connectivity by using(More)
Low latency analytics on geographically distributed datasets (across datacenters, edge clusters) is an upcoming and increasingly important challenge. The dominant approach of aggregating all the data to a single datacenter significantly inflates the timeliness of analytics. At the same time, running queries over geo-distributed inputs using the current(More)
Providing timely results in the face of rapid growth in data volumes has become important for analytical frameworks. For this reason, frameworks increasingly operate on only a subset of the input data. A key property of such sampling is that combinatorially many subsets of the input are present. We present KMN, a system that leverages these choices to(More)