• Publications
  • Influence
Gaining insights into multicore cache partitioning: Bridging the gap between simulation and real systems
TLDR
Cache partitioning and sharing is critical to the effective utilization of multicore processors. Expand
Synthesis of High-Performance Parallel Programs for a Class of ab Initio Quantum Chemistry Models
TLDR
This paper provides an overview of a program synthesis system for a class of quantum chemistry computations. Expand
PARDA: A Fast Parallel Reuse Distance Analysis Algorithm
TLDR
We present Parda, a parallel algorithm to compute accurate reuse distances by analysis of memory address traces. Expand
Automatic code generation for many-body electronic structure methods: the tensor contraction engine
TLDR
We present an overview of the Tensor Contraction Engine (TCE), a unique effort to address issues of both productivity and performance through automatic code generation. Expand
Soft-OLP: Improving Hardware Cache Performance through Software-Controlled Object-Level Partitioning
TLDR
We present a system-software framework that partitions the last-level cache at the object level, in order to improve program performance for both single-thread and parallel data-sharing programs. Expand
Enabling software management for multicore caches with a lightweight hardware support
TLDR
The management of shared caches in multicore processors is a critical and challenging task. Expand
MCC-DB: Minimizing Cache Conflicts in Multi-core Processors for Databases
TLDR
We propose a hybrid system method called MCC-DB for accelerating executions of warehouse-style queries that relies on the DBMS knowledge of data access patterns to minimize LLC conflicts in multi-core systems through an enhanced OS facility of cache partitioning. Expand
Data Layout Transformation for Enhancing Data Locality on NUCA Chip Multiprocessors
TLDR
We develop a compile-time framework for data locality optimization via data layout transformation to reduce non-local accesses for localizable computations. Expand
Combining analytical and empirical approaches in tuning matrix transposition
TLDR
We develop an integrated optimization framework that addresses a number of issues, including tiling for the memory hierarchy, effective handling of memory misalignment, utilizing memory subsystem characteristics, and the exploitation of the parallelism provided by the vector instruction sets in current processors. Expand
Performance optimization of tensor contraction expressions for many-body methods in quantum chemistry.
Complex tensor contraction expressions arise in accurate electronic structure models in quantum chemistry, such as the coupled cluster method. This paper addresses two complementary aspects ofExpand
...
1
2
3
...