Kay Ousterhout

Learn More
Large-scale data analytics frameworks are shifting towards shorter task durations and larger degrees of parallelism to provide low latency. Scheduling highly parallel jobs that complete in hundreds of milliseconds poses a major challenge for task schedulers, which will need to schedule millions of tasks per second on appropriate machines while offering(More)
There has been much research devoted to improving the performance of data analytics frameworks, but comparatively little effort has been spent systematically identifying the performance bottlenecks of these systems. In this paper, we develop blocked time analysis, a methodology for quantifying performance bottlenecks in distributed computation frameworks,(More)
Large-scale data analytics frameworks are shifting towards shorter task durations and larger degrees of parallelism to provide low latency. However, scheduling highly parallel jobs that complete in hundreds of milliseconds poses a major challenge for cluster schedulers, which will need to place millions of tasks per second on appropriate nodes while(More)
We argue for breaking data-parallel jobs in compute clusters into tiny tasks that each complete in hundreds of milliseconds. Tiny tasks avoid the need for complex skew mitigation techniques: by breaking a large job into millions of tiny tasks, work will be evenly spread over available resources by the scheduler. Furthermore, tiny tasks alleviate long wait(More)
Large scale streaming systems aim to provide high throughput and low latency. They are often used to run mission-critical applications, and must be available 24x7. Thus such systems need to adapt to failures and inherent changes in workloads, with minimal impact on latency and throughput. Unfortunately, existing solutions require operators to choose between(More)
Large-scale data analytics frameworks are shifting towards shorter task durations and larger degrees of parallelism to provide low latency. However, scheduling highly parallel jobs that complete in hundreds of milliseconds poses a major scaling challenge for cluster schedulers, which will need to place millions of tasks per second on appropriate nodes while(More)
In the concurrent multi-commodity flow problem, we are given a capacitated network G = (V,E) of switches V connected by links E, and a set of commodities K = {(si, ti, di)}. The objective is to maximize the minimum fraction λ of any demand di that is routed from source si to target ti. This problem has been studied extensively by the theory community in the(More)
In today's data analytics frameworks, many users struggle to reason about the performance of their workloads. Without an understanding of what factors are most important to performance, users can't determine what configuration parameters to set and what hardware to use to optimize runtime. This paper explores a system architecture designed to make it easy(More)
Users often struggle to reason about the performance of today's systems. Without an understanding of what factors are most important to performance, users do not know how to tune their system's hardware and software configuration to improve performance. We argue that performance clarity -- making it easy to understand where bottlenecks lie and the(More)
  • 1