Learn More
This paper presents novel cache optimizations for massively parallel, throughput-oriented architectures like GPUs. L1 data caches (L1 D-caches) are critical resources for providing high-bandwidth and low-latency data accesses. However, the high number of simultaneous requests from single-instruction multiple-thread (SIMT) cores makes the limited capacity of(More)
On-chip caches are commonly used in computer systems to hide long off-chip memory access latencies. To manage on-chip caches, either software-managed or hardware-managed schemes can be employed. State-of-art accelerators, such as the NVIDIA Fermi or Kepler GPUs and Intel's forthcoming MIC “Knights Landing” (KNL), support both software-managed(More)
The high amount of memory requests from massive threads may easily cause cache contention and cache-miss-related resource congestion on GPUs. This paper proposes a simple yet effective performance model to estimate the impact of cache contention and resource congestion as a function of the number of warps/thread blocks (TBs) to bypass the cache. Then we(More)
Cultivars of hot pepper (Capsicum annuum L.) vary greatly in their fruit cadmium (Cd) concentration. Previously, we identified a low-Cd (YCT) and a high-Cd (JFZ) cultivar. In this study, we elucidated the physiological mechanisms resulting in the differences in their Cd accumulation. A time-dependent and concentration-dependent hydroponic experiment was(More)
Caches are universally used in computing systems to hide long off-chip memory access latencies. Unlike CPUs, massive threads running simultaneously on GPUs bring a tremendous pressure on memory hierarchy. As a result, the limitation of cache resources becomes a bottleneck for a GPU to exploit thread-level parallelism (TLP) and memory-level parallelism (MLP)(More)
The root cadmium (Cd) concentrations of sweet potato (Ipomoea batatas [L.] Lam.) cultivars vary greatly. In this study, we explored the role of shoots and roots in Cd accumulation as well as the related underlying physiological mechanisms using the previously identified high-Cd cultivar X16 and low-Cd cultivar N88. We used the split-root technique and(More)
Because of their high throughput and power efficiency, massively parallel architectures like graphics processing units (GPUs) become a popular platform for generous purpose computing. However, there are few studies and analyses on GPU instruction set architectures (ISAs) although it is wellknown that the ISA is a fundamental design issue of all modern(More)
  • 1