• Publications
  • Influence
PowerPack: Energy Profiling and Analysis of High-Performance Systems and Applications
tl;dr
We extend our framework to support systems with multicore, multiprocessor-based nodes, and then provide in-depth analyses of the energy consumption of parallel applications on clusters. Expand
  • 386
  • 33
  • Open Access
A Simplified and Accurate Model of Power-Performance Efficiency on Emergent GPU Architectures
tl;dr
We combine hardware performance counter data with machine learning and advanced analytics to model power-performance efficiency for modern GPU-based systems. Expand
  • 112
  • 12
  • Open Access
Superneurons: dynamic GPU memory management for training deep neural networks
tl;dr
We present SuperNeurons: a dynamic GPU memory scheduling runtime to enable the network training far beyond the GPU DRAM capacity. Expand
  • 68
  • 10
  • Open Access
GraphReduce: processing large-scale graphs on accelerator-based systems
tl;dr
We present GraphReduce, a highly efficient and scalable GPU-based framework that operates on graphs that exceed the device's internal memory capacity. Expand
  • 49
  • 7
  • Open Access
Locality-Driven Dynamic GPU Cache Bypassing
tl;dr
This paper presents novel cache optimizations for massively parallel, throughput-oriented architectures like GPUs. Expand
  • 81
  • 5
  • Open Access
Locality-Aware CTA Clustering for Modern GPUs
tl;dr
We propose the concept of CTA-Clustering and its associated software-based techniques to reshape the default CTA scheduling in order to group the CTAs with potential reuse together on the same SM. Expand
  • 41
  • 5
  • Open Access
Evaluating Modern GPU Interconnect: PCIe, NVLink, NV-SLI, NVSwitch and GPUDirect
tl;dr
In this paper, we fill this gap by thoroughly characterizing a variety of modern GPU interconnects, including PCIe, NVLink Version-1, NV Link Version-2, NV-SLI and NVSwitch, from six high-end servers and HPC platforms. Expand
  • 27
  • 4
  • Open Access
Investigating the Interplay between Energy Efficiency and Resilience in High Performance Computing
tl;dr
We present an energy saving undervaluing approach that leverages the mainstream resilience techniques to tolerate the increased failures caused by under valuing. Expand
  • 23
  • 3
  • Open Access
Processing-in-Memory Enabled Graphics Processors for 3D Rendering
tl;dr
We propose two architectural designs to enableProcessing-In-Memory based GPU for efficient 3D rendering. Expand
  • 19
  • 3
  • Open Access
Designing energy efficient communication runtime systems: a view from PGAS models
tl;dr
In this paper, we present a design for Power Aware One-Sided Communication Llibrary – PASCoL. Expand
  • 24
  • 3
  • Open Access