Decoupled access/execute computer architectures

@article{Smith1984DecoupledAC,
  title={Decoupled access/execute computer architectures},
  author={James E. Smith},
  journal={ACM Trans. Comput. Syst.},
  year={1984},
  volume={2},
  pages={289-308}
}
  • James E. Smith
  • Published 1 November 1984
  • Computer Science
  • ACM Trans. Comput. Syst.
An architecture for improving computer performance is presented and discussed. The main feature of the architecture is a high degree of ._-.-. decoupling between operand access anb execution. This results in an implementation which has two separate instruction streams that communicate via queues. A similar architecture has been previously proposed for array processors, but in that context the software is called on to do most of the coordination and synchronization between the instruction… 

Figures from this paper

Code generation for streaming: an access/execute mechanism
TLDR
The code generation and optimization algorithms that are used in an optimizing compiler for an architecture that contains explicit hardware support for the access/execute model of computation are described.
SMT Possibilities for Decoupled Architecture
TLDR
What programs developed for decoupled architectures where the decoupling is an explicit part of the programming model might look like, and a methodology to determine properties of such programs before full scale architectural development and simulation is undertaken are discussed.
Decoupled vector architectures
  • R. Espasa, M. Valero
  • Computer Science
    Proceedings. Second International Symposium on High-Performance Computer Architecture
  • 1996
TLDR
An important part of this paper is devoted to study the tradeoffs involved in choosing an adequate size for the different queues of the architecture, so that the hardware cost of the queues can be minimized while still retaining most of the performance advantages of decoupling.
Compilation to a queue-based architecture
TLDR
This thesis presents a programming language and compiler that capitalize on the strengths of the Aries Decentralized Abstract Machine and demonstrates performance double that of the naive non-streaming implementation of the ADAM.
Performance of Parallel Loops using Alternative Cache Consistency Protocols on a Non-Bus Multiprocessor
TLDR
A preliminary study of the performance of parallel loops on a non-bus shared-memory multiprocessor and examined the impact on parallel performance of two software and two hardware cache consistency techniques as well as three scheduling policies.
Exploiting Parallelism Between Control and Data Computation
TLDR
This paper proposes a technique to reduce the control flow bottleneck by observing that much of the controlflow computation can be performed in parallel with data computation, and proposes an architecture to execute the control and work threads in parallel.
A Decoupled Architecture of Processors with Scratch-Pad Memory Hierarchy
TLDR
This paper presents a decoupled architecture of processors with a memory hierarchy of only scratch-pad memories, and a main memory that achieves the above performance with insignificant overheads in terms of area.
A Decoupled Architecture for Accelerating Multimedia Applications
TLDR
This paper presents an architecture that decouples the useful/true computations from the overhead/supporting instructions in media applications and is incorporated into an out-of-order general-purpose processor enhanced with SIMD extensions.
Execution Performance of the Scheduled Dataflow Architecture ( SDF )
TLDR
The non-blocking and functional nature of SDF, make it easier to coordinate the memory accesses and execution of a thread, as well as eliminate unnecessary dependencies among instructions.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 44 REFERENCES
Decoupled access/execute computer architectures
TLDR
An architecture for improving computer performance which has a high degree of decoupling between operand access and execution, resulting in an implementation which has two separate instruction streams that communicate via queues.
Pipe: a high performance VLSI architecture
TLDR
The pipe architecture (parallel instructions and pipelined execution) is proposed as a research vehicle for studying high performance VLSI architectures and organizations and its planned implementation is described.
The IBM System/360 model 91: machine philosophy and instruction-handling
TLDR
It is shown that history recording (the retention of complete instruction loops in the CPU) reduces the need to exercise storage, and that sophisticated employment of buffering techniques has reducedt he effective access time.
Very high-speed computing systems
TLDR
The constituents of a system: storage, execution, and instruction handling (branching) are discussed with regard to recent developments and/or systems limitations.
Percolation of Code to Enhance Parallel Dispatching and Execution
This note investigates the increase in parallel execution rate as a function of the size of an instruction dispatch stack with lookahead hardware. Under the constraint that instructions are not
Information content of CPU memory referencing behavior
TLDR
Techniques are developed for analyzing the effectiveness of the addressing architecture and Memory/CPU traffic of existing machines with respect to the information theoretic bound for a given trace.
Technology and Design Tradeoffs in the Creation of a Modern Supercomputer
  • N. Lincoln
  • Computer Science
    IEEE Transactions on Computers
  • 1982
TLDR
This paper is an attempt to elevate supercomputer development from the mystique of being an art to the level of a science of synergistic combination of programming, technology, structure, and packaging.
Coding guidelines for pipelined processors
TLDR
This paper is a tutorial for assembly language programmers of pipelined processors and presents a collection of coding guidelines for them, particularly significant to compiler developers who determine object code patterns.
Functionally Parallel Architecture for Array Processors
Based on the natural division of mathematical problems, functional parallelism becomes the architectural key for improving speed/cost ratios for array processors.
Detection and Parallel Execution of Independent Instructions
For a single instruction stream–single data stream organization the problem of simultaneously issuing several instructions is studied.
...
1
2
3
4
5
...