Zarka Cvetanovic

Learn More
A new sort algorithm, called AlphaSort, demonstrates that commodity processors and disks can handle commercial batch workloads. Using Alpha AXP processors, commodity memory, and arrays of SCSI disks, AlphaSort runs the industry-standard sort benchmark in seven seconds. This beats the best published record on a 32-cpu 32-disk Hypercube by 8:1. On another(More)
This paper describes extensions to OpenMP which implement data placement features needed for NUMA architectures. OpenMP is a collection of compiler directives and library routines used to write portable parallel programs for shared-memory architectures. Writing efficient parallel programs for NUMA architectures, which have characteristics of both(More)
The performance analysis undertaken examined a number of workloads with different characteristics, including the SPEC95 benchmark suites (floating-point and integer), the LINPACK benchmark, AIM Suite VII (UNIX multiuser benchmark), the TPC-C transaction processing benchmark, image rendering, and memory latency and bandwidth tests. Note that both commercial(More)
<italic>This paper evaluates performance characteristics of the Compaq ES40 shared memory multiprocessor. The ES40 system contains up to four Alpha 21264 CPU's together with a high-performance memory system. We qualitatively describe architectural features included in the 21264 microprocessor and the surrounding system chipset. We further quantitatively(More)
The characteristics of several commercial and technical workloads on the DEC 7000 AXP system are compared using built-in hardware monitors. The data analyzed include total instructions, cycles, multiple-issued instructions, stall components, cache misses, and instruction types. The data indicates that the two classes of workloads have vastly different(More)
A new sort algorithm, called AlphaSort, demonstrates that commodity processors and disks can handle commercial batch workloads. Using commodity processors, memory, and arrays of SCSI disks, AlphaSort runs the industrystandard sort benchmark in seven seconds. This beats the best published record on a 32-CPU 32-disk Hypercube by 8:1. On another benchmark,(More)
In this paper, we describe the decomposition of six algorithms: two partial differential equations (PDE) solvers (successive over-relaxation [SOR] and alternating direction implicit [ADI]), fast Fourier transform (FFT), Monte Carlo simulations, Simplex linear programming, and Sparse solvers. The algorithms were selected not only because of their importance(More)