Computer technology and architecture: an evolving interaction

@article{Hennessy1991ComputerTA,
  title={Computer technology and architecture: an evolving interaction},
  author={J. Hennessy and N. Jouppi},
  journal={Computer},
  year={1991},
  volume={24},
  pages={18-29}
}
The interaction between computer architecture and IC technology is examined. To evaluate the attractiveness of particular technologies, computer designs are assessed primarily on the basis of performance and cost. The focus is mainly on CPU performance, both because it is easier to measure and because the impact of technology is most easily seen in the CPU. The technology trends discussed concern memory size, design complexity and time, and design scaling. Architectural trends in the areas of… Expand
A universal parallel computer architecture
  • W. Dally
  • Computer Science
  • New Generation Computing
  • 2009
TLDR
The technology and architecture trends motivating fine-grain architecture and the enabling technologies of high-performance interconnection networks and low-overhead interaction mechanisms are described and a discussion of the J-Machine, a prototype fine- grain concurrent computer is discussed. Expand
The case for a single-chip multiprocessor
TLDR
It is shown that in advanced technologies it is possible to implement a single-chip multiprocessor in the same area as a wide issue superscalar processor, and it is found that for applications with little parallelism the performance of the two microarchitectures is comparable. Expand
Multithreaded Architectures: Principles, Projects, and Issues
TLDR
Multithreaded processing element architectures are a promising alternative to RISC architecture and its multiple-instruction-issue extensions such as VLIW, superscalar, and superpipelined architectures. Expand
Interconnecting Computers: Architecture, Technology, and Economics
  • B. Lampson
  • Computer Science
  • Programming Languages and System Architectures
  • 1994
Modern computer systems have a recursive structure of processing and storage elements that are interconnected to make larger elements: Functional units connected to registers and on-chip cache.Expand
Monster : a tool for analyzing the interaction between operating systems and computer architectures
TLDR
The need for OS performance evaluation tools is argued, previous hardware and software based monitoring techniques are summarized, the design of Monster is discussed, and an analysis of compilation workloads which test and demonstrate Monster’s capabilities are presented. Expand
The M-Machine multicomputer
TLDR
The architecture of the M-Machine is presented and how its mechanisms attempt to maximize both single thread performance and overall system throughput are described. Expand
The M-machine multicomputer
TLDR
The architecture of the M-Machine is presented and how its mechanisms attempt to maximize both single thread performance and overall system throughput are described. Expand
MemSpy: analyzing memory system bottlenecks in programs
TLDR
MemSpy is described, a prototype tool that helps programmers identify and fix memory bottlenecks in both sequential and parallel programs and introduces the notion of data oriented, in addition to code oriented, performance tuning. Expand
Rationale, Design and Performance of the Hydra Multiprocessor
TLDR
Initial estimates of the interprocessor communication latencies show them to be much better than current bus-based multiprocessors, which will result in higher performance on applications with fine grained parallelism. Expand
Performance evaluation of parallel applications on multiprocessor systems on chip
  • O. Hammami, G. Tian
  • Computer Science
  • MELECON 2008 - The 14th IEEE Mediterranean Electrotechnical Conference
  • 2008
TLDR
A MPSOC based on Network-on-Chip (NoC) is developed and presented and results are compared between the NoC based multiprocessor system and the traditional bus-based MPS OC. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 21 REFERENCES
Computer Architecture: A Quantitative Approach
This best-selling title, considered for over a decade to be essential reading for every serious student and practitioner of computer design, has been updated throughout to address the most importantExpand
The Nonuniform Distribution of Instruction-Level and Machine Parallelism and Its Effect on Performance
  • N. Jouppi
  • Computer Science
  • IEEE Trans. Computers
  • 1989
TLDR
A methodology for quickly estimating machine performance is developed and is shown to be accurate to within 15% for three widely different machine pipelines: the CRAY-1, the MultiTitan, and a dual-issue superscalar machine. Expand
Using cache memory to reduce processor-memory traffic
TLDR
It is demonstrated that a cache exploiting primarily temporal locality (look-behind) can indeed reduce traffic to memory greatly, and introduce an elegant solution to the cache coherency problem. Expand
A case for direct-mapped caches
  • M. Hill
  • Computer Science
  • Computer
  • 1988
Direct-mapped caches are defined, and it is shown that trends toward larger cache sizes and faster hit times favor their use. The arguments are restricted initially to single-level caches inExpand
IBM second-generation RISC machine organization
  • H. Bakoglu, G. Grohoski, +10 authors S. Dhawan
  • Computer Science
  • Digest of Papers Compcon Spring '90. Thirty-Fifth IEEE Computer Society International Conference on Intellectual Leverage
  • 1990
A highly concurrent second-generation superscalar reduced-instruction-set computer (RISC) is described. It combines a powerful RISC architecture with sophisticated hardware design techniques toExpand
Limits on Multiple Instruction Issue
TLDR
This paper investigates the limitations on designing a processor which can sustain an execution rate of greater than one instruction per cycle on highly-optimized, non-scientific applications and determines that these applications contain enough instruction independence to sustain an instruction rate of about two instructions per cycle. Expand
Limits of instruction-level parallelism
TLDR
The results of simulations of 18 different test programs under 375 different models of available parallelism analysis are presented, showing how simulations based on instruction traces can model techniques at the limits of feasibility and even beyond. Expand
Cache and memory hierarchy design: a performance-directed approach
TLDR
This paper presents a meta-modelling architecture that automates the very labor-intensive and therefore time-heavy and therefore expensive and expensive process of designing and implementing a caching system. Expand
High-bandwidth data memory systems for superscalar processors
TLDR
Multi-port, nonblocking (MPNB) L1 caches introduced in this paper for the top of the data memory hierarchy appear to be capable of supporting the bandwidth demands of futuregeneration superscalar processors. Expand
Distributed-directory scheme: scalable coherent interface
TLDR
The scalable coherent interface (SCI), a local or extended computer backplane interface being defined by an IEEE standard project (P1596), is discussed and request combining, a useful feature of linked-list coherence, is described. Expand
...
1
2
3
...