Rights to individual papers remain with the author or the author's employer. Permission is granted for noncommercial reproduction of the work for educational or research purposes. This copyright notice must be included in the reproduced paper. USENIX acknowledges all trademarks herein.
We describe a methodology that enables the real-time diagnosis of performance problems in complex high-performance distributed systems. The methodology includes tools for generating precision event logs that can be used to provide detailed end-to-end application and system level monitoring; a Java agent-based system for managing the large amount of logging… (More)
We have designed, built, and analyzed a distributed parallel storage system that will supply image streams fast enough to permit multi-user, “real-time”, video-like applications in a wide-area ATM network-based Internet environment. We have based the implementation on user-level code in order to secure portability; we have characterized the… (More)
We describe an approach to the analysis of the performance of distributed applications in high-speed wide-area networks. The approach is designed to identify all of the issues that impact performance, and isolate the causes due to the related hardware and software components. We also describe the use of a distributed parallel data server as a network load… (More)
As the practice of science moves beyond the single investigator due to the complexity of the problems that now dominate science, large collaborative and multi-institutional teams are needed to address these problems. In order to support this shift in science, the computing and data handling infrastructure that is essential to most of modern science must… (More)