• Corpus ID: 6913858

Memory Latency E ects in Decoupled ArchitecturesLizyamma

@inproceedings{Kurian1994MemoryLE,
  title={Memory Latency E ects in Decoupled ArchitecturesLizyamma},
  author={Kurian and Paul T. Hulina and D. D. Lee and CoraorComputer},
  year={1994}
}
Decoupled computer architectures partition the memory access and execute functions in a computer program and achieve high performance by exploiting the ne{grain paral-lelism between the two. These architectures make use of an access processor to perform the data fetch ahead of demand by the execute process and hence are often less sensitive to memory access delays than conventional architectures. Past performance studies of de-coupled computers used memory systems that are interleaved or… 
Program balance and its impact on high performance RISC architectures
TLDR
This paper presents studies on the balance of access and computation tasks on a typical RISC architecture, the MIPS, and discusses how these instruction stream characteristics can limit the instruction issue in superscalar processors.
Decoupled vector architectures
  • R. Espasa, M. Valero
  • Computer Science
    Proceedings. Second International Symposium on High-Performance Computer Architecture
  • 1996
TLDR
An important part of this paper is devoted to study the tradeoffs involved in choosing an adequate size for the different queues of the architecture, so that the hardware cost of the queues can be minimized while still retaining most of the performance advantages of decoupling.
Decoupled Vector Architectures : a rst look
TLDR
It is shown that using decoupling techniques in a vector processor, the performance of vector programs can be greatly improved and tolerates so well long memory latencies that it could make feasible to use very slow DRAM parts in vector computers in order to reduce cost.
A Simulation Study of Decoupled Vector Architectures
TLDR
This paper simulates a selection of the Perfect Club and Specfp92 benchmark suites and compares their execution time on a conventional single port vector architecture with that of a decoupled vector architecture.
Out-of-order vector architectures
TLDR
A new technique based on register renaming is targeted at dynamically eliminating spill code and is shown to provide an extra speedup ranging between 1.10 and 1.20 while reducing total memory traffic by an average of 15-20%.
Multithreaded vector architectures
  • R. Espasa, M. Valero
  • Computer Science
    Proceedings Third International Symposium on High-Performance Computer Architecture
  • 1997
TLDR
It is shown that multi-threading techniques can be applied to a vector processor to greatly increase processor throughput and maximize resource utilization, and multithreading provides for this architecture a performance advantage.

References

SHOWING 1-10 OF 37 REFERENCES
Memory latency effects in decoupled architectures with a single data memory module
TLDR
This work conducts a simulation study of the latency effects in decoupled computers when connected to a single, conventional non-interleaved data memory module so that the effect of decoupling is isolated from the improvement caused by interleaving.
The effects of memory latency and fine-grain parallelism on Astronautics ZS-1 performance
TLDR
A comparison indicates how well the Astronautics ZS-1 tolerates increased memory latency as a function of slip and provides insights regarding application codes, architectures, and compiler capabilities.
Structured Memory Access Architecture
TLDR
The authors investigate one method of reducing the Von Neumann bottleneck by reducing addressing overhead and improves system performance by efficiently generating operand requests, making fewer memory references, and maximizing computation and access process overlap.
A Simulation Study of Decoupled Architecture Computers
TLDR
This paper presents a decoupled architecture that splits instruction processing into memory access and execution tasks into two separate sets of instructions, one for accessing memory and one for performing function execution.
Improving performance of small on-chip instruction caches
TLDR
An alternative approach is presented in this paper, in which a combination of an instruction cache, instruction queue and instruction queue buffer is used to achieve the same effect with a much smaller instruction cache size.
Performance evaluation of the pipe computer architecture
TLDR
The simulator and the simulation results of this study can be used to guide the efforts in hardware implementation and software development for the PIPE architecture.
Performance evaluation of on-chip register and cache organizations
TLDR
This paper compares several different local memory organizations applicable for single-chip processors to determine effective access time, since a wide variety of register and cache organizations are used.
Performance trade-offs for microprocessor cache memories
TLDR
This study indicates that lessons from mainframe and minicomputer design practice should be critically examined to benefit the design of microprocessors.
Classiication and Performance Evaluation of Instruction Buuering Techniques
TLDR
This paper classify these buuers into traditional instruction buuers, conventional instruction caches and prefetch queues, detail their prominent features, and evaluate the performance of buuers in several existing systems, using trace driven simulation.
Classification and performance evaluation of instruction buffering techniques
TLDR
The speed disparity between processor and memory subsystenis has been bridged in many existing large- scale scientific computers and microproc with the help of instruction burners or instruction caches and a metric is proposed for the implementation of this approach.
...
1
2
3
4
...