Learn More
In the area of network monitoring a lot of tools are already available to measure a variety of metrics. However, these tools are often limited to a single administrative domain so that no established methodology for the monitoring of network connections spanning over multiple domains currently exists. In addition , these tools only monitor the network from(More)
This paper analyzes and evaluates some novel latency hiding features of the KSR1 multiprocessor: prefetch and poststore instructions and automatic updates. As a case study, we analyze the performance of an iterative sparse solver which generates irregular communications. We show that automatic updates signiicantly reduce the amount of communication.(More)
We propose the creation of a multi-domain measurement framework with dynamic characteristics identical to that of the network as a whole. Our approach recognises and facilitates the ability of independent network entities to set policies and limits on the use of measurement resources locally while encouraging and facilitating the use of such resources by(More)
We have developed a hierarchical performance bounding methodology that attempts to explain the performance of loop-dominated scientific applications on particular systems. The Kendall Square Research KSR1 is used as a running example. We model the throughput of key hardware units that are common bottlenecks in concurrent machines. The four units currently(More)
Communication has a dominant impact on the performance of massively parallel processors (MPPs). We propose a methodology to evaluate the internode communication performance of MPPs using a controlled set of synthetic workloads. By generating a range of sparse matrices and measuring the performance of a simple parallel algorithm that repeatedly multiplies a(More)
A widely-distributed network monitoring system requires a scalable discovery mechanism. The " Lookup Service " component of the perfSONAR framework is able to manage component registration, distill resource data into tractable units, and respond to queries regarding system and performance information. A model of organizing and distributing information is(More)
The MACS performance model introduced here can be applied to a Machine and Application of interest, the Compiler-generated workload, and the Scheduling of the workload by the compiler. The Ma, MAC, and MACS bounds each fix the named subset of M, A, C, and S while freeing the bound from the constraints imposed by the others. A/X performance measurement is(More)
We have developed an automatic technique for evaluating the communication performance of massively parallel processors (MPPs). Both communication latency and the amount of communication are investigated as a function of a few basic parameters that characterize an application workload. Parameter values are captured in an automatically generated sparse matrix(More)
A methodology for performance analysis of Massively Parallel Processors (MPPs) is presented. The IBM SP2 and some key routines of a finite element method application (FEMC) are used as a case study. A hierarchy of lower bounds on run time is developed for the POWER2 processor, using the MACS methodology developed in earlier work for uniprocessors and vector(More)
  • 1