- Full text PDF available (21)
- This year (0)
- Last 5 years (8)
- Last 10 years (22)
Journals and Conferences
As technology scaling poses a threat to DRAM scaling due to physical limitations such as limited charge, alternative memory technologies including several emerging non-volatile memories are being explored as possible DRAM replacements. One main roadblock for wider adoption of these new memories is the limited write endurance, which leads to wear-out related… (More)
Phase change memory (PCM) is an emerging memory technology for future computing systems. Compared to other non-volatile memory alternatives, PCM is more matured to production, and has a faster read latency and potentially higher storage density. The main roadblock precluding PCM from being used, in particular, in the main memory hierarchy, is its limited… (More)
An updated take on Amdahl's analytical model uses modern design constraints to analyze many-core design alternatives. The revised models provide computer architects with a better understanding of many-core design types, enabling them to make more informed tradeoffs.
Memory bandwidth has become a major performance bottleneck as more and more cores are integrated onto a single die, demanding more and more data from the system memory. Several prior studies have demonstrated that this memory bandwidth problem can be addressed by employing a 3D-stacked memory architecture, which provides a wide, high frequency memory-bus… (More)
This paper describes the architecture, design, analysis, and simulation and measurement results of the 3D-MAPS (3D massively parallel processor with stacked memory) chip built with a 1.5 V, 130 nm process technology and a two-tier 3D stacking technology using 1.2 μ -diameter, 6 μ -height through-silicon vias (TSVs) and μ -diameter face-to-face bond pads.… (More)
Due to the ever-increasing design complexity and physical constraint in frequency scaling, chip multiprocessors are considered the de facto architecture baseline for future processor generation. Through resource sharing, applications running on a CMP can achieve better resource utilization and faster inter-core communication, leading to a higher overall… (More)
A traditional fixed-function graphics accelerator has evolved into a programmable general-purpose graphics processing unit over the last few years. These powerful computing cores are mainly used for accelerating graphics applications or enabling low-cost scientific computing. To further reduce the cost and form factor, an emerging trend is to integrate GPU… (More)
Single-chip CPU/GPU architecture is being adopted in high-end (embedded) systems, e.g., smartphones and tablet PCs. Main memory subsystem is expected to consist of hybrid DRAM and phase-change RAM (PRAM) due to the difficulties in DRAM scaling. In this work, we address the performance optimization of the hybrid DRAM/PRAM main memory for single chip CPU/GPU.… (More)
We describe the design and analysis of 3D-MAPS, a 64-core 3D-stacked memory-on-processor running at 277 MHz with 63 GB/s memory bandwidth, sent for fabrication using Tezzaron's 3D stacking technology. We also describe the design flow used to implement it using industrial 2D tools and custom add-ons to handle 3D specifics.
SIMD execution units in GPUs are increasingly used for high performance and energy efficient acceleration of general purpose applications. However, SIMD control flow divergence effects can result in reduced execution efficiency in a class of GPGPU applications, classified as divergent applications. Improving SIMD efficiency, therefore, has the potential to… (More)