Shuaiwen Song

Learn More
Energy efficiency is a major concern in modern high-performance computing system design. In the past few years, there has been mounting evidence that power usage limits system scale and computing density, and thus, ultimately system performance. However, despite the impact of power and energy on the computer systems community, few studies provide insight to(More)
Emergent heterogeneous systems must be optimized for both power and performance at exascale. Massive parallelism combined with complex memory hierarchies form a barrier to efficient application and architecture design. These challenges are exacerbated with GPUs as parallelism increases orders of magnitude and power consumption can easily double. Models have(More)
As the march to the exascale computing gains momentum, energy consumption of supercomputers has emerged to be the critical roadblock. While architectural innovations are imperative in achieving computing of this scale, it is largely dependent on the systems software to leverage the architectural innovations. Parallel applications in many computationally(More)
Recent work on real-world graph analytics has sought to leverage the massive amount of parallelism offered by GPU devices, but challenges remain due to the inherent irregularity of graph algorithms and limitations in GPU-resident memory for storing large graphs. We present GraphReduce, a highly efficient and scalable GPU-based framework that operates on(More)
This paper presents novel cache optimizations for massively parallel, throughput-oriented architectures like GPUs. L1 data caches (L1 D-caches) are critical resources for providing high-bandwidth and low-latency data accesses. However, the high number of simultaneous requests from single-instruction multiple-thread (SIMT) cores makes the limited capacity of(More)
Support Vector Machine (SVM) has been widely used in data-mining and Big Data applications as modern commercial databases start to attach an increasing importance to the analytic capabilities. In recent years, SVM was adapted to the field of High Performance Computing for power/performance prediction, auto-tuning, and runtime scheduling. However, even at(More)
Accelerators are used in about 13% of the current Top500 List. Supercomputers leveraging accelerators grew by a factor of 2.2x in 2012 and are expected to completely dominate the Top500 by 2015. Though most of these deployments use NVIDIA GPGPU accelerators, Intel's Xeon Phi architecture will likely grow in popularity in the coming years. Unfortunately,(More)
Future large scale high performance supercomputer systems require high energy efficiency to achieve exaflops computational power and beyond. Despite the need to understand energy efficiency in high-performance systems, there are few techniques to evaluate energy efficiency at scale. In this paper, we propose a system-level iso-energy-efficiency model to(More)
Future high performance systems must use energy efficiently to achieve PFLOPS computational speeds and beyond. To address this challenge, we must first understand the power and energy characteristics of high performance computing applications. In this paper, we use a power-performance profiling framework called PowerPack to study the power and energy(More)
The insatiable demand of high performance computing is being driven by the most computationally intensive applications such as computational chemistry, climate modeling, nuclear physics, etc. The last couple of decades have observed a tremendous rise in supercomputers with architectures ranging from traditional clusters to system-on-a-chip in order to(More)