• Publications
  • Influence
AlphaSort: a RISC machine sort
TLDR
A new sort algorithm, called AlphaSort, demonstrates that commodity processors and disks can handle commercial batch workloads and proposes two new benchmarks: Minutesort: how much can you sort in a minute, and DollarSort: how to sort for a dollar.
Extending OpenMP For NUMA Machines
TLDR
Extensions to OpenMP Fortran that implemen data placemen features needed for NUMA architectures are described and some of the techniques that the Compaq Fortran compiler uses to generate efficient code based on these extensions are described.
Performance analysis of the Alpha 21264-based Compaq ES40 system
TLDR
It is found that the Compaq ES40 often provides 2 to 3 times the performance of the AlphaServer 4100 at similar clock frequencies, and the ES40 memory system has about five times the memory bandwidth of the 4100.
Performance analysis of the Alpha 21364-based HP GS1280 multiprocessor
TLDR
It is found that the HP GS1280 often provides 2 to 3 times the performance of the AlphaServer GS320 at similar clock frequencies, and the key reasons are advances in memory, inter-processor, and I/O subsystem designs.
Characterization of alpha AXP performance using TP and SPEC workloads
TLDR
A simple model for evaluating the effects of various design tradeoffs based on the data collected by using hardware monitors is proposed and indicates that Alpha AXP takes advantage of lower cycles per instruction and cycle time to achieve a significant performance advantage.
The Effects of Problem Partitioning, Allocation, and Granularity on the Performance of Multiple-Processor Systems
  • Z. Cvetanovic
  • Computer Science
    IEEE Transactions on Computers
  • 1 April 1987
TLDR
The results indicate that for algorithms where both the computation and the communication overhead can be fully decomposed among N processors, the speedup is a nondecreasing function of the level of granularity for arbitrary interconnection structure and allocation of subproblems to processors.
Alphasort: A cache-sensitive parallel external sort
TLDR
A new sort algorithm, called AlphaSort, demonstrates that commodity processors and disks can handle commercial batch workloads and argues that modern architectures require algorithm designers to re-examine their use of the memory hierarchy.
Performance characterization of the Alpha 21164 microprocessor using TP and SPEC workloads
TLDR
The AlphaServer 8200 provides 2 to 3 times the performance of the DEC 7000 server based on the faster clock, larger on-chip cache, expanded multiple-issuing, and lower cache/memory latencies and higher bandwidth.
Performance Analysis of the FFT Algorithm on a Shared-Memory Parallel Architecture
TLDR
The results indicate that the communication delay is significantly affected by the method applied to allocate data to memory modules, and the communication time complexity is increased to O(log N) since all N requests generated by processors are serialized at a single memory module.
AlphaServer 4100 Performance Characterization
1. AlphaServer 4100 model 5/300E, which has up to four 300-megahertz (MHz) Alpha 21164 microprocessors, each without a module-level, thirdlevel, write-back cache (B-cache) (a design referred to as
...
1
2
3
...