• Publications
  • Influence
Gaining insights into multicore cache partitioning: Bridging the gap between simulation and real systems
TLDR
This paper has comprehensively evaluated several representative cache partitioning schemes with different optimization objectives, including performance, fairness, and quality of service (QoS) and provides new insights into dynamic behaviors and interaction effects. Expand
Synthesis of High-Performance Parallel Programs for a Class of ab Initio Quantum Chemistry Models
TLDR
This paper provides an overview of a program synthesis system for a class of quantum chemistry computations, expressible as a set of tensor contractions and arise in electronic structure modeling. Expand
PARDA: A Fast Parallel Reuse Distance Analysis Algorithm
TLDR
This paper presents the first parallel algorithm to compute accurate reuse distances by analysis of memory address traces, using a tunable parameter that enables faster analysis when the maximum needed reuse distance is limited by a cache size upper bound. Expand
Automatic code generation for many-body electronic structure methods: the tensor contraction engine
TLDR
An overview of the Tensor Contraction Engine (TCE), a unique effort to address issues of both productivity and performance through automatic code generation that acts like an optimizing compiler. Expand
Soft-OLP: Improving Hardware Cache Performance through Software-Controlled Object-Level Partitioning
TLDR
Experimental results show that in comparison with a standard L2 cache managed by LRU, Soft-OLP significantly reduces the execution time by reducing L1 cache misses across inputs for a set of single- and multi-threaded programs from the SPEC CPU2000 benchmark suite, NAS benchmarks and a computational kernel set. Expand
Enabling software management for multicore caches with a lightweight hardware support
TLDR
This work proposes to provide an affordable and lightweight hardware support to coordinate with OS-based cache management policies that are scalable to many-cores, and perform comparably with other proposed hardware solutions, but have much lower overheads, therefore can be easily adopted in commodity processors. Expand
MCC-DB: Minimizing Cache Conflicts in Multi-core Processors for Databases
TLDR
This paper proposes a hybrid system method called MCC-DB for accelerating executions of warehouse-style queries, which relies on the DBMS knowledge of data access patterns to minimize LLC conflicts in multi-core systems through an enhanced OS facility of cache partitioning. Expand
Data Layout Transformation for Enhancing Data Locality on NUCA Chip Multiprocessors
TLDR
This paper develops a compile-time framework for data locality optimization via data layout transformation using a polyhedral model and demonstrates the effectiveness of the approach on a 16-core 2D tiled CMP. Expand
Combining analytical and empirical approaches in tuning matrix transposition
TLDR
An integrated optimization framework is developed that addresses a number of issues, including tiling for the memory hierarchy, effective handling of memory misalignment, utilizing memory subsystem characteristics, and the exploitation of the parallelism provided by the vector instruction sets in current processors. Expand
Performance optimization of tensor contraction expressions for many-body methods in quantum chemistry.
TLDR
An effective algorithm for operation minimization with common subexpression identification is described and its effectiveness on tensor contraction expressions for coupled cluster equations is demonstrated and a library for efficient index permutation of multidimensional tensors is described. Expand
...
1
2
3
...