Ganesh Ananthanarayanan

Learn More
Experience from an operational Map-Reduce cluster reveals that outliers signi cantly prolong job completion. ˆe causes for outliers include run-time contention for processor, memory and other resources, disk failures, varying bandwidth and congestion along network paths and, imbalance in task workload. We present Mantri, a system that monitors tasks and(More)
Small jobs, that are typically run for interactive data analyses in datacenters, continue to be plagued by disproportionately long-running tasks called stragglers. In the production clusters at Facebook and Microsoft Bing, even after applying state-of-the-art straggler mitigation techniques, these latency sensitive jobs have stragglers that are on average 8(More)
To improve data availability and resilience MapReduce frameworks use file systems that replicate data <i>uniformly</i>. However, analysis of job logs from a large production cluster shows wide disparity in data popularity. Machines and racks storing popular content become bottlenecks; thereby increasing the completion times of jobs accessing this data even(More)
▪ Replayed Facebook and Bing workloads ▪ LIFE reduces average completion time by 53% and 51% in Facebook and Bing workloads ▪ Small jobs see 77% improvement ▪ LFU-F improves cluster utilization by 47% and 53% in the Facebook and Bing workloads ▪ LIFE and LFU-F beat Belady’s MIN despite lower cache hit-ratio ▪ Pre-fetch & Pre-replace → Ideal (87%) speedup ▪(More)
Mobile devices are increasingly equipped with multiple network interfaces: Wireless Local Area Network (WLAN) interfaces for local connectivity and Wireless Wide Area Network (WWAN) interfaces for wide-area connectivity. The WWAN typically provides much wider coverage but much lower speeds than the WLAN. To address this dichotomy, we present COMBINE, a(More)
Low latency analytics on geographically distributed datasets (across datacenters, edge clusters) is an upcoming and increasingly important challenge. The dominant approach of aggregating all the data to a single datacenter significantly inflates the timeliness of analytics. At the same time, running queries over geo-distributed inputs using the current(More)
Tasks in modern data parallel clusters have highly diverse resource requirements, along CPU, memory, disk and network. Any of these resources may become bottlenecks and hence, the likelihood of wasting resources due to fragmentation is now larger. Today's schedulers do not explicitly reduce fragmentation. Worse, since they only allocate cores and memory,(More)
Mobile devices are increasingly equipped with multiple network interfaces with complementary characteristics. In particular, the Wi-Fi interface has high throughput and transfer power efficiency, but its idle power consumption is prohibitive. In this paper we present, <i> Blue-Fi</i>, a sytem that predicts the availability of the Wi-Fi connectivity by using(More)
Active research is being conducted in reducing power consumption of all the components of the Internet. To that end, we propose schemes for power reduction in network switches − Time Window Prediction, Power Save Mode and Lightweight Alternative. These schemes are adaptive to changing traffic patterns and automatically tune their parameters to guarantee a(More)
Data-intensive computing (DISC) frameworks scale by partitioning a <i>job</i> across a set of fault-tolerant <i>tasks</i>, then diffusing those tasks across large clusters. Multi-tenanted clusters must accommodate service-level objectives (SLO) in their resource model, often expressed as a maximum latency for allocating the desired set of resources to every(More)