Characterization of alpha AXP performance using TP and SPEC workloads

@inproceedings{Cvetanovic1994CharacterizationOA,
  title={Characterization of alpha AXP performance using TP and SPEC workloads},
  author={Zarka Cvetanovic and Dileep Bhandarkar},
  booktitle={ISCA '94},
  year={1994}
}
The characteristics of several commercial and technical workloads on the DEC 7000 AXP system are compared using built-in hardware monitors. The data analyzed include total instructions, cycles, multiple-issued instructions, stall components, cache misses, and instruction types. The data indicates that the two classes of workloads have vastly different characteristics and impose different requirements on the system design. Compared to VAX, Alpha AXP takes advantage of lower cycles per… 
Performance characterization of the Alpha 21164 microprocessor using TP and SPEC workloads
  • Z. Cvetanovic, D. Bhandarkar
  • Computer Science
    Proceedings. Second International Symposium on High-Performance Computer Architecture
  • 1996
TLDR
The AlphaServer 8200 provides 2 to 3 times the performance of the DEC 7000 server based on the faster clock, larger on-chip cache, expanded multiple-issuing, and lower cache/memory latencies and higher bandwidth.
Comparing and contrasting a commercial OLTP workload with CPU2000 on IPF
TLDR
The results show that while IPF's bundle constraints cause a large injection of NOPs into the code stream, IPFs register stack engine successfully reduces the number of memory operations by nearly 50% and the control-flow predictability of O DB is better than CPU2000, in spite of ODB's large active branch footprint.
Accounting for the performance of Standard MLon the DEC
TLDR
Surprisingly, the processor is stalled for 60-70% of the total cycles executed for most benchmarks | resource connicts and data cache misses accounting for a large fraction of these.
Contrasting characteristics and cache performance of technical and multi-user commercial workloads
TLDR
The data presented shows that increasing the associativity of second-level caches can reduce miss rates significantly and should help system designers choose a cache configuration that will perform well in commercial markets.
Program balance and its impact on high performance RISC architectures
TLDR
This paper presents studies on the balance of access and computation tasks on a typical RISC architecture, the MIPS, and discusses how these instruction stream characteristics can limit the instruction issue in superscalar processors.
Studies of Windows NT performance using dynamic execution traces
TLDR
It is concluded that processor bandwidth can be a first-order bottleneck to achieving good performance when studying commercial benchmarks, and operating system code and data structures contribute disproportionately to the memory access load.
Evaluation of Existing Architectures in IRAM Systems
TLDR
This work examined both execution time analyses of existing microprocessors and system simulation of hypothetical processors to determine whether existing microarchitectures can tap the potential performance advantages of IRAM systems.
Instruction fetching: Coping with code bloat
TLDR
Evidence is presented that current software-development practices produce applications that exhibit substantially higher instruction-cache miss ratios than do the SPEC benchmarks, and a collection of applications, called the instruction benchmark suite (IBS), that provides a better test of instruction- cache performance.
1 PERFORMANCE ANALYSIS OF ALPHASERVER GS 1280 White Paper
TLDR
This paper evaluates performance characteristics of the HP AlphaServer GS1280 shared-memory multiprocessor system, comparing and contrast it to the previous-generation Alpha systems, as well as other-vendor systems.
Effectiveness and Limitations of Embedded Counter Based Performance Analysis
TLDR
The results show that the Pentium Pro memory subsystem and branch prediction are performing well, but the UOP per cycle and IPC numbers are low, and it is speculated that the data hazards present between i structions and UOPs are the cause for the poor performance.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 22 REFERENCES
Transaction processing performance on PA-RISC commercial Unix systems
TLDR
The authors briefly compare and contrast selected 'architectural results' and 'implementation results' from three diverse applications: transaction processing, general-purpose, and technical/scientific.
Instruction level profiling and evaluation of the IBM RS/6000
TLDR
Preliminary results from using goblin, a new instruction level profiling system, to evaluate the IBM RISC System/6000 architecture indicates that for the SPEC benchmark suite the architecture of the RS/6000 is well balanced and exhibits impressive performance, especially on the floating-point intensive applications.
Measuring VAX 8800 performance with a histogram hardware monitor
TLDR
This paper reports the results of a study of VAX 8800 processor performance using a hardware monitor that collects histograms of the processor's micro-PC and memory bus status, which yields a very detailed picture of the amount of time an average VAX instruction spends in various activities on the 8800.
How does processor MHz relate to end-user performance? II. Memory subsystem and instruction set
TLDR
It is shown that performance measurements on many systems support the initial claim that cycle time is not sufficient to determine performance.
DEC 7000/10000 Model 600 AXP multiprocessor server
  • B. Allison
  • Computer Science
    Digest of Papers. Compcon Spring
  • 1993
TLDR
The DEC 7000 and 10000 products are mid-range and mainframe Alpha AXP system offerings that combine high-speed chips, large caches, multiprocessor system architecture, high-performance backplane interconnect, and large memory capacity to create mainframe-class performance with a cost and size previously attributed to mid- range systems.
Performance from architecture: comparing a RISC and a CISC with similar hardware organization
TLDR
This paper compares an example implementation from the RISC and CISC architectural schools (a MIPS M/2000 and a Digital VAX 8700) on nine of the ten SPEC benchmarks and demonstrates the correlation between cycles per instruction and relative instruction count.
Cache performance of the SPEC92 benchmark suite
TLDR
The authors consider whether SPECmarks, the figures of merit obtained from running the SPEC benchmarks under certain specified conditions, accurately indicate the performance to be expected from real, live work loads, and it is found that instruction cache miss ratios in general, and data cache miss ratio for the integer benchmarks, are quite low.
HP's PA7100LC: a low-cost superscalar PA-RISC processor
TLDR
A new low- cost, superscalar PA-RISC processor including two integer arithmetic and logic units, a floating-point coprocessor, and a memory and I/O controller on a single VLSI chip that achieves performance levels comparable to those of previous generation high-end workstations while lowering overall system cost and power consumption to make possible a new generation of low-cost systems.
A measure of transaction processing power
TLDR
These benchmarks measure the performance of diverse transaction processing systems and a standard system cost measure is stated and used to define price/performance metrics.
AlphaSort: a RISC machine sort
TLDR
A new sort algorithm, called AlphaSort, demonstrates that commodity processors and disks can handle commercial batch workloads and proposes two new benchmarks: Minutesort: how much can you sort in a minute, and DollarSort: how to sort for a dollar.
...
1
2
3
...