Allen B. Downey

Learn More
We measure the distribution of lifetimes for UNIX processes and propose a functional form that fits this distribution well. We use this functional form to derive a policy for preemptive migration, and then use a trace-driven simulator to compare our proposed policy with other preemptive migration policies, and with a non-preemptive load balancing strategy.(More)
We evaluate <b>pathchar</b>, a tool that infers the characteristics of links along an Internet path (latency, bandwidth, queue delays). Looking at two example paths, we identify circumstances where <b>pathchar</b> is likely to succeed, and develop techniques to improve the accuracy of <b>pathchar</b>'s estimates and reduce the time it takes to generate(More)
We propose a user model that explains the shape of the distribution of file sizes in local file systems and in the World Wide Web. We examine evidence from 562 file systems, 38 web clients and 6 web servers, and find that the model is a good description of these systems. These results cast doubt on the widespread view that the distribution of file sizes is(More)
We develop a workload model based on the observed behavior of parallel computers at the San Diego Supercomputer Center and the Cornell Theory Center. This model gives us insight into the performance of strategies for scheduling moldable jobs on space-sharing parallel computers. We find that Adaptive Static Partitioning (ASP), which has been reported to work(More)
We review evidence that Internet traffic is characterized by long-tailed distributions of interarrival times, transfer times, burst sizes, and burst lengths. We propose a new statistical technique for identifying long-tailed distributions, and apply it to a variety of datasets collected on the Internet. We find that there is little evidence that(More)
We present statistical techniques for predicting the queue times experienced by jobs submitted to a space-sharing parallel machine with rstcomerst-served (FCFS) scheduling. We apply these techniques to trace data from the Intel Paragon at the San Diego Supercomputer Center and the IBM SP2 at the Cornell Theory Center. We show that it is possible to predict(More)
We propose a new model for parallel speedup that is based on two parameters, the average parallelism of a program and its variance in parallelism. We present a way to use the model to estimate these program characteristics using only observed speedup curves (as opposed to the more detailed program knowledge otherwise required). We apply this method to(More)
Numerous studies have reported long-tailed distributions for various network metrics, including file sizes, transfer times, and burst lengths. We review techniques for identifying long-tailed distributions based on a sample, propose a new technique, and apply these methods to datasets used in previous reports. We find that the evidence for long tails is(More)
The study and design of computer systems requires good models of the workload to which these systems are subjected. Until recently, the data necessary to build these models---observations from production installations---were not available, especially for parallel computers. Instead, most models were based on assumptions and mathematical attributes that(More)
When a malleable job is submitted to a space-sharing parallel computer, it must choose often whether to begin execution on a small, available cluster, or wait in queue for more processors to become available. To make this decision, it must predict how long it will have to wait for the larger cluster. We propose statistical techniques for predicting these(More)