• Publications
  • Influence
An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness
TLDR
GPU architectures are increasingly important in the multi-core era due to their high number of parallel processors. Expand
  • 633
  • 64
  • PDF
Qilin: Exploiting parallelism on heterogeneous multiprocessors with adaptive mapping
TLDR
We propose adaptive mapping, a fully automatic technique to map computations to processing elements on a CPU+GPU machine. Expand
  • 544
  • 48
  • PDF
An integrated GPU power and performance model
GPU architectures are increasingly important in the multi-core era due to their high number of parallel processors. Performance optimization for multi-core processors has been a challenge forExpand
  • 454
  • 44
  • PDF
Feedback Directed Prefetching: Improving the Performance and Bandwidth-Efficiency of Hardware Prefetchers
TLDR
This paper proposes a mechanism that incorporates dynamic feedback into the design of the prefetcher to increase the performance improvement provided by prefetching as well as to reduce the negative performance impact of large main memory latencies. Expand
  • 286
  • 35
  • PDF
Inferring Fine-grained Control Flow Inside SGX Enclaves with Branch Shadowing
TLDR
We explore a new, yet critical, side-channel attack, branch shadowing, that reveals fine-grained control flows (branch granularity) in an enclave. Expand
  • 267
  • 26
  • PDF
Transparent Hardware Management of Stacked DRAM as Part of Memory
TLDR
This paper proposes a practical, low-cost architectural solution to efficiently enable using large fast memory as Part-of-Memory (PoM) seamlessly, without the involvement of the OS. Expand
  • 74
  • 18
  • PDF
SD3: A Scalable Approach to Dynamic Data-Dependence Profiling
TLDR
We propose a scalable approach to data-dependence profiling that addresses both runtime and memory overhead in a single framework by parallelizing the dependence profiling step itself. Expand
  • 94
  • 14
  • PDF
GraphPIM: Enabling Instruction-Level PIM Offloading in Graph Computing Frameworks
TLDR
We present GraphPIM, a full-stack solution for graph computing that achieves higher performance using PIM functionality. Expand
  • 98
  • 14
  • PDF
Many-Thread Aware Prefetching Mechanisms for GPGPU Applications
TLDR
We consider the problem of how to improve memory latency tolerance in massively multithreaded GPGPUs when the thread-level parallelism of an application is not sufficient to hide memory latency. Expand
  • 125
  • 10
  • PDF
GraphBIG: understanding graph computing in the context of industrial solutions
TLDR
We present GraphBIG, a graph computing benchmark suite inspired by IBM System G project, a benchmark suite that covers major graph computation types and data sources. Expand
  • 93
  • 10
  • PDF