• Publications
  • Influence
Performance evaluation of Intel® Transactional Synchronization Extensions for high-performance computing
TLDR
In this paper, we evaluate the first hardware implementation of Intel TSX using a set of high-performance computing (HPC) workloads, and demonstrate that applying IntelTSX to these workloads can provide significant performance improvements. Expand
  • 243
  • 50
  • PDF
Carbon: architectural support for fine-grained parallelism on chip multiprocessors
TLDR
We propose Carbon, a hardware technique to accelerate dynamic task scheduling on scalable CMPs with small tasks for which software task schedulers achieve only limited parallel speedups. Expand
  • 225
  • 15
  • PDF
IMP: Indirect memory prefetcher
TLDR
We propose an efficient hardware indirect memory prefetcher to capture irregular memory accesses resulting from following edges in a graph or non-zero elements in sparse matrix. Expand
  • 68
  • 13
  • PDF
Convergence of Recognition, Mining, and Synthesis Workloads and Its Implications
TLDR
This paper examines the growing need for a general-purpose ldquoanalytics enginerdquo that can enable next-generation processing platforms to effectively model events, objects, and concepts based on end-user input, and accessible datasets, along with an ability to iteratively refine the model in real-time. Expand
  • 113
  • 11
  • PDF
Speculative precomputation: long-range prefetching of delinquent loads
TLDR
This paper explores Speculative Precomputation, a technique that uses idle thread contexts in a multithreaded architecture to improve performance of single-threaded applications. Expand
  • 153
  • 10
  • PDF
RSIM: Simulating Shared-Memory Multiprocessors with ILP Processors
TLDR
The early 1990s saw several announcements of commercial shared-memory systems using processors that aggressively exploited instruction-level parallelism (ILP), including the MIPS R10000, Hewlett-Packard PA8000, and Intel Pentium Pro. Expand
  • 217
  • 9
  • PDF
Saving energy with architectural and frequency adaptations for multimedia applications
TLDR
This paper develops and evaluates an integrated algorithm to control both architectural adaptation and DVS targeted to multimedia applications, which is effective in saving energy and exploits most of the available potential. Expand
  • 85
  • 8
  • PDF
Soft real-time scheduling on simultaneous multithreaded processors
TLDR
We explore soft real-time co-scheduling on an SMT processor, focusing more on co-schedule selection and resource sharing. Expand
  • 102
  • 7
  • PDF
Saving energy with architectural and frequency adaptations for multimedia applications
TLDR
This paper develops and evaluates an integrated algorithm to control both architectural adaptation and dynamic voltage (and frequency) scaling targeted to multimedia applications. Expand
  • 87
  • 7
  • PDF
Hybrid transactional memory
TLDR
We propose a novel hybrid hardware-software transactional memory scheme that approaches the performance of a hardware scheme when resources are not exhausted and gracefully falls back to a software scheme otherwise. Expand
  • 116
  • 7
  • PDF