ISA-independent workload characterization and its implications for specialized architectures
The end of Dennard Scaling has necessitated research into the adoption of specialized architectures for offloading specific code regions in applications. Recent works in accelerator architectures have chosen diverse workloads and even diverse code regions (within the same workload) to highlight the efficacy of specific accelerator architectures. However this makes it challenging to evaluate the power/performance benefits of each accelerator. It is unclear in the era of specialization whether it will be feasible to standardize a new set of kernels across different architectural ideas. We present an alternative vision where we identify and prepare “acceleratable” code regions from existing CPU-based benchmark suites that are widely used. We identify acceleratable paths by leveraging program analysis  to precisely identify directed acyclic paths that are frequently executed. We reconstruct the paths into a separate function within the original binary and demarcate the accelerator region to enable microarchitecture independent analysis and enable precise profiling when executing the program on an architecture simulator or instrumentation tool (e.g., Intel Pin). We extract “accelerator” offload targets from frequently executed paths for 29 workloads across SPEC2000, SPECCPU2006, PARSEC and PERFECT benchmark suites and demonstrate that characterization along paths is more precise than characterization at coarser granularities in prior work. Overall, we analyze 356K paths across 29 workloads and present statistics for the top 5 paths identified for offload in each application. We have also generated a workload suite with the acceleratable code paths to help computer architecture researchers.