Allan Snavely

Learn More
Simultaneous Multithreading machines fetch and execute instructions from multiple instruction streams to increase system utilization and speedup the execution of jobs. When there are more jobs in the system than there is hardware to support simultaneous execution, the operating system scheduler must choose the set of jobs to coscheduleThis paper(More)
Simultaneous Multithreading machines benefit from jobscheduling software that monitors how well coscheduled jobs share CPU resources, and coschedules jobs that interact well to make more efficient use of those resources. As a result, informed coscheduling can yield significant performance gains over naive schedulers. However, prior work on coscheduling(More)
Binary instrumentation facilitates the insertion of additional code into an executable in order to observe or modify the executable's behavior. There are two main approaches to binary instrumentation: static and dynamic binary instrumentation. In this paper we present a static binary instrumentation toolkit for Linux on the x86/x86_64 platforms, PEBIL(More)
The size of supercomputers in numbers of processors is growing exponentially. Today’s largest supercomputers have upwards of a hundred thousand processors and tomorrow’s may have on the order one million. The applications that run on these systems commonly coordinate their parallel activities via MPI; a trace of these MPI communication events is an(More)
Cycle-accurate simulation is far too slow for modeling the expected performance of full parallel applications on large HPC systems. And just running an application on a system and observing wallclock time tells you nothing about why the application performs as it does (and is anyway impossible on yet-to-be-built systems). Here we present a framework for(More)
Laura Carrington, Dimitri Komatitsch, Michael Laurenzano, Mustafa M Tikir , David Michéa, Nicolas Le Goff, Allan Snavely , Jeroen Tromp a Performance Modeling and Characterization Lab, San Diego Supercomputer Center, La Jolla, CA, USA b Université de Pau, CNRS and INRIA Magique-3D, Laboratoire de Modélisation et d'Imagerie en Géosciences, Pau, France c(More)
This paper presents a performance modeling methodology that is faster than traditional cycle-accurate simulation, more sophisticated than performance estimation based on system peak-performance metrics, and is shown to be effective on a class of High Performance Computing benchmarks. The method yields insight into the factors that affect performance on(More)
The Tera MTA is a revolutionary commercial computer based on a multithreaded processor architecture. In contrast to many other parallel architectures, the Tera MTA can effectively use high amounts of parallelism on a single processor. By running multiple threads on a single processor, it can tolerate memory latency and to keep the processor saturated. If(More)