Computer technology and architecture: an evolving interaction

  title={Computer technology and architecture: an evolving interaction},
  author={John L. Hennessy and Norman P. Jouppi},
The interaction between computer architecture and IC technology is examined. To evaluate the attractiveness of particular technologies, computer designs are assessed primarily on the basis of performance and cost. The focus is mainly on CPU performance, both because it is easier to measure and because the impact of technology is most easily seen in the CPU. The technology trends discussed concern memory size, design complexity and time, and design scaling. Architectural trends in the areas of… 

Figures from this paper

A universal parallel computer architecture

  • W. Dally
  • Computer Science
    New Generation Computing
  • 2009
The technology and architecture trends motivating fine-grain architecture and the enabling technologies of high-performance interconnection networks and low-overhead interaction mechanisms are described and a discussion of the J-Machine, a prototype fine- grain concurrent computer is discussed.

The case for a single-chip multiprocessor

It is shown that in advanced technologies it is possible to implement a single-chip multiprocessor in the same area as a wide issue superscalar processor, and it is found that for applications with little parallelism the performance of the two microarchitectures is comparable.

Multithreaded Architectures: Principles, Projects, and Issues

Multithreaded processing element architectures are a promising alternative to RISC architecture and its multiple-instruction-issue extensions such as VLIW, superscalar, and superpipelined architectures.

Interconnecting Computers: Architecture, Technology, and Economics

  • B. Lampson
  • Computer Science
    Programming Languages and System Architectures
  • 1994
Modern computer systems have a recursive structure of processing and storage elements that are interconnected to make larger elements: Functional units connected to registers and on-chip cache.

Monster : a tool for analyzing the interaction between operating systems and computer architectures

The need for OS performance evaluation tools is argued, previous hardware and software based monitoring techniques are summarized, the design of Monster is discussed, and an analysis of compilation workloads which test and demonstrate Monster’s capabilities are presented.

The M-Machine multicomputer

The M-Machine is an experimental multicomputer being developed to test architectural concepts motivated by the constraints of modern semiconductor technology and the demands of programming systems.

The M-machine multicomputer

The architecture of the M-Machine is presented and how its mechanisms attempt to maximize both single thread performance and overall system throughput are described.

MemSpy: analyzing memory system bottlenecks in programs

MemSpy is described, a prototype tool that helps programmers identify and fix memory bottlenecks in both sequential and parallel programs and introduces the notion of data oriented, in addition to code oriented, performance tuning.

Rationale, Design and Performance of the Hydra Multiprocessor

Initial estimates of the interprocessor communication latencies show them to be much better than current bus-based multiprocessors, which will result in higher performance on applications with fine grained parallelism.

Performance evaluation of parallel applications on multiprocessor systems on chip

  • O. HammamiG. Tian
  • Computer Science, Engineering
    MELECON 2008 - The 14th IEEE Mediterranean Electrotechnical Conference
  • 2008
A MPSOC based on Network-on-Chip (NoC) is developed and presented and results are compared between the NoC based multiprocessor system and the traditional bus-based MPS OC.



Computer Architecture: A Quantitative Approach

This best-selling title, considered for over a decade to be essential reading for every serious student and practitioner of computer design, has been updated throughout to address the most important

The Nonuniform Distribution of Instruction-Level and Machine Parallelism and Its Effect on Performance

A methodology for quickly estimating machine performance is developed and is shown to be accurate to within 15% for three widely different machine pipelines: the CRAY-1, the MultiTitan, and a dual-issue superscalar machine.

Using cache memory to reduce processor-memory traffic

It is demonstrated that a cache exploiting primarily temporal locality (look-behind) can indeed reduce traffic to memory greatly, and introduce an elegant solution to the cache coherency problem.

A case for direct-mapped caches

  • M. Hill
  • Computer Science
  • 1988
Direct-mapped caches are defined, and it is shown that trends toward larger cache sizes and faster hit times favor their use. The arguments are restricted initially to single-level caches in

IBM second-generation RISC machine organization

  • H. BakogluG. Grohoski S. Dhawan
  • Computer Science
    Digest of Papers Compcon Spring '90. Thirty-Fifth IEEE Computer Society International Conference on Intellectual Leverage
  • 1990
A highly concurrent second-generation superscalar reduced-instruction-set computer (RISC) is described. It combines a powerful RISC architecture with sophisticated hardware design techniques to

Limits on multiple instruction issue

This paper investigates the limitations on designing a processor which can sustain an execution rate of greater than one instruction per cycle on highly-optimized, non-scientific applications and determines that these applications contain enough instruction independence to sustain an instruction rate of about two instructions per cycle.

Limits of instruction-level parallelism

The results of simulations of 18 different test programs under 375 different models of available parallelism analysis are presented, showing how simulations based on instruction traces can model techniques at the limits of feasibility and even beyond.

Cache and memory hierarchy design: a performance-directed approach

This paper presents a meta-modelling architecture that automates the very labor-intensive and therefore time-heavy and therefore expensive and expensive process of designing and implementing a caching system.

High-bandwidth data memory systems for superscalar processors

Multi-port, nonblocking (MPNB) L1 caches introduced in this paper for the top of the data memory hierarchy appear to be capable of supporting the bandwidth demands of futuregeneration superscalar processors.

Distributed-directory scheme: scalable coherent interface

The scalable coherent interface (SCI), a local or extended computer backplane interface being defined by an IEEE standard project (P1596), is discussed and request combining, a useful feature of linked-list coherence, is described.