The progressive integration of processor and memory has unexpected implications for the design of DSM systems. To exploit this integration best, we claim that we need to redesign the nodes of DSM systems and then reorganize the whole machine. In this paper, we propose a new DSM organization where processor nodes have their on-chip memories conngured as(More)
Optimizing on-chip primary data caches for parallel scientific applications is challenging because different applications exhibit different behavior. Indeed, while some applications exhibit good spatial locality, others have accesses with long strides that prevent the effective use of cache lines. Finally, other applications cannot exploit long lines(More)
