Learn More
With speculative parallelization, code sections that cannot be fully analyzed by the compiler are aggressively executed in parallel. Hardware schemes are fast but expensive and require modifications to the processors and memory system. Software schemes require no extra hardware but can be inefficient.This paper proposes a new software-only speculative(More)
—With speculative parallelization, code sections that cannot be fully analyzed by the compiler are optimistically executed in parallel. Hardware schemes are fast but expensive and require modifications to the processors and/or memory system. Software schemes require no changes to the hardware of existing shared-memory systems, but can suffer from(More)
Speculative parallelization aggressively executes in parallel codes that cannot be fully parallelized by the compiler. Past proposals of hardware schemes have mostly focused on single-chip multiprocessors (CMPs), whose effectiveness is necessarily limited by their small size. Very few schemes have attempted this technique in the context of scalable(More)
With speculative thread-level parallelization, codes that cannot be fully compiler-analyzed are aggressively executed in parallel. If the hardware detects a cross-thread dependence violation, it squashes offending threads and resumes execution. Unfortunately, frequent squashing cripples performance. This paper proposes a new framework of hardware mechanisms(More)
Speculative parallelization is a technique that complements automatic compiler parallelization by allowing code sections that cannot be fully analyzed by the compiler to be aggresively executed in parallel.However, while speculative parallelization can potentially deliver significant speedups, several overheads associated with the technique limit these(More)
Finding the fastest algorithm to solve a problem is one of the main issues in Computational Geometry. Focusing only on worst case analysis or asymptotic computations leads to the development of complex data structures or hard to implement algorithms. Randomized algorithms appear in this scenario as a very useful tool in order to obtain easier(More)
With speculative thread-level parallelization, codes that cannot be fully compiler-analyzed are aggressively executed in parallel. If the hardware detects a cross-thread dependence violation, it squashes offending threads and resumes execution. Unfortunately, frequent squashing cripples performance. This paper proposes a new framework of hardware mechanisms(More)
ÐWhile hardware-coherent scalable shared-memory multiprocessors are relatively easy to program, they still require substantial programming effort to deliver high performance. Specifically, to minimize remote accesses, data must be carefully laid out in memory for locality and application working sets carefully tuned for caches. It has been claimed that this(More)
The large latency of memory accesses is a major obstacle in obtaining high processor utilization in large scale shared-memory multiprocessors. Access to remote memory is likely to be slow, compared to the ever-increasing speeds of processors. Thus, any scalable architecture must rely on techniques that can cope with the large latency of memory accesses to(More)
  • 1