A characterization of processor performance in the VAX-11/780

@inproceedings{Emer1998ACO,
  title={A characterization of processor performance in the VAX-11/780},
  author={Joel S. Emer and Douglas W. Clark},
  booktitle={ISCA '98},
  year={1998}
}
This paper reports the results of a study of VAX-11/780 processor performance using a novel hardware monitoring technique. A micro-PC histogram monitor was built for these measurements. It keeps a count of the number of microcode cycles executed at each microcode location. Measurement experiments were performed on live timesharing workloads as well as on synthetic workloads of several types. The histogram counts allow the calculation of the frequency of various architectural events, such as the… 
Architectural support for irregular programs and performance monitoring for heterogeneous systems
TLDR
This thesis proposes architectural enhancements to the profiling and workgroup scheduling subsystems of heterogeneous devices and proposes a benchmark suite for heterogeneous systems where flexibility in behavior is a primary guiding design choice.
Vertically Integrated Computing Labs Using Open-Source Hardware Generators and Cloud-Hosted FPGAs
TLDR
This work describes the experiences with a joint hardware-software approach to exploring computer architecture concepts in class exercises, by using open- source processor hardware implementations, generator-based hardware design methodologies, and cloud-hosted FPGAs to create a connecting thread between computer science and electrical engineering experience-based curricula.
50 Years of computer architecture: From the mainframe CPU to the domain-specific tpu and the open RISC-V instruction set
  • D. Patterson
  • Computer Science
    2018 IEEE International Solid - State Circuits Conference - (ISSCC)
  • 2018
TLDR
This document explains how IBM bet that they could invent a single ISA that would work for customers of all four lines, and how that vision required a new way to build computers that would be binary compatible from the cheapest 8-bit model to the fastest 64-bit version.
The effect of an optical network on-chip on the performance of chip multiprocessors
TLDR
This thesis investigates a simple circuit-switched ONoC with lower component count where nodes need to request a channel before transmission, and a coherence-based message predictor is proposed, to setup circuits before message arrival to hide the path setup latency.
Memory Access Scheduling for Improving performance
The bandwidth and latency of a memory system are strongly dependent on the manner in which accesses interact with the “3-D” structure of banks, rows, and columns characteristic of contemporary DRAM
Reduced Instruction Set Computers Then and Now
A widely cited Computer article published in 1982 described the reduced instruction set computer (RISC) as an alternative to the general trend at the time toward increasingly complex instruction
Understanding and Improving Graph Algorithm Performance
TLDR
This dissertation characterizes graph processing workloads on shared memory multiprocessors in order to understand graph algorithm performance and introduces the Graph Algorithm Iron Law (GAIL), a simple performance model that allows for reasoning about tradeoffs across layers by considering algorithmic efficiency, cache locality, and memory bandwidth utilization.
Efficient Control and Communication Paradigms for Coarse-Grained Spatial Architectures
TLDR
This article shows that a spatial accelerator using triggered instructions and latency-insensitive channels can achieve 8 × greater area-normalized performance than a traditional general-purpose processor.
GAIL: the graph algorithm iron law
TLDR
The Graph Algorithm Iron Law (GAIL) is presented to quantify these tradeoffs to help understand graph algorithm performance.
Just-in-time data structures
TLDR
JitDS, a programming language to develop Just-in-Time Data Structures, which enable representation changes at runtime, based on declarative input from a performance expert programmer are presented.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 26 REFERENCES
An instruction timing model of CPU performance
A model of high-performance computers is derived from instruction timing formulas, with compensation for pipeline and cache memory effects. The model is used to predict the performance of the IBM
Performance of the VAX-11/780 translation buffer: simulation and measurement
TLDR
The authors present the results of a set of measurements and simulations of translation buffer performance in the VAX-11/780, a hardware cache of recently used virtual-to-physical address mappings.
ACM TOCS
  • ACM TOCS
  • 1983
Cache Performance in the VAX-11/780
TLDR
Measurements are reported including the hit ratios of data and instruction references, the rate of cache invalidations by I/O, and the amount of waiting time due to cache misses.
Comparative Analysis of Computer Architectures
TLDR
Total number of instructions executed, shows the VAX architecture to be most efficient, but measures of the activity necessary by the interpreter indicate that the S/370 representation is fastest to interpret, while memory reference behavior indicated that the 8-bit displacement used by theVAX is very effective for local referencing, but VAX suffers in referencing global objects.
Performance of the VAX- 111780 Translation Buffer: Simulation and Measurement
  • Performance of the VAX- 111780 Translation Buffer: Simulation and Measurement
  • 1983
Performance of the VAX111780 Translation Buffer: Simulation and Measurement
  • Submitted for publication,
  • 1983
A case study of VAX-11 instruction set usage for compiler execution
TLDR
This paper looks at dynamic VAX-11 instruction set usage by one class of programs, and discusses the methodology and tools which have been developed to provide that information.
An analysis of a mesa instruction set using dynamic instruction frequencies
TLDR
An evaluation of the advantages and costs of Mesa's compact byte encoding, its reliance upon an evaluation stack, and its use of memory is provided.
...
1
2
3
...