Hardware transactional memory for GPU architectures

@article{Fung2011HardwareTM,
  title={Hardware transactional memory for GPU architectures},
  author={Wilson W. L. Fung and Inderpreet Singh and Andrew Brownsword and Tor M. Aamodt},
  journal={2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)},
  year={2011},
  pages={296-307}
}
Graphics processor units (GPUs) are designed to efficiently exploit thread level parallelism (TLP), multiplexing execution of 1000s of concurrent threads on a relatively smaller set of single-instruction, multiple-thread (SIMT) cores to hide various long latency operations. While threads within a CUDA block/OpenCL workgroup can communicate efficiently through an intra-core scratchpad memory, threads in different blocks can only communicate via global memory accesses. Programmers wishing to… CONTINUE READING
Highly Influential
This paper has highly influenced a number of papers. REVIEW HIGHLY INFLUENTIAL CITATIONS
Highly Cited
This paper has 63 citations. REVIEW CITATIONS