Pin: building customized program analysis tools with dynamic instrumentation
- C. Luk, R. Cohn, Kim M. Hazelwood
- Computer ScienceACM-SIGPLAN Symposium on Programming Language…
- 12 June 2005
The goals are to provide easy-to-use, portable, transparent, and efficient instrumentation, and to illustrate Pin's versatility, two Pintools in daily use to analyze production software are described.
Qilin: Exploiting parallelism on heterogeneous multiprocessors with adaptive mapping
Adaptive mapping is proposed, a fully automatic technique to map computations to processing elements on a CPU+GPU machine and it is shown that, by judiciously distributing works over the CPU and GPU, automatic adaptive mapping achieves a 25% reduction in execution time and a 20% reduced in energy consumption than static mappings on average for a set of important computation benchmarks.
The pochoir stencil compiler
- Yuan Tang, R. Chowdhury, Bradley C. Kuszmaul, C. Luk, C. Leiserson
- Computer ScienceACM Symposium on Parallelism in Algorithms and…
- 4 June 2011
The Pochoir stencil compiler allows a programmer to write a simple specification of a stencil in a domain-specific stencil language embedded in C++ which the Pochir compiler then translates into high-performing Cilk code that employs an efficient parallel cache-oblivious algorithm.
Compiler-based prefetching for recursive data structures
It is demonstrated that compiler-inserted prefetching can significantly improve the execution speed of pointer-based codes---as much as 45% for the applications the authors study and can improve performance by as much as twofold.
CMP $ im : A Pin-Based OnThe-Fly Multi-Core Cache Simulator
This paper presents the use of binary instrumentation as an alternative to execution-driven and trace-driven simulation methodologies to explore the design space of a CMP memory hierarchy and presents CMP$im to characterize cache performance of single-threaded, multi- threaded, and multi-programmed workloads at the speeds of 4-10 MIPS.
Asim: A Performance Model Framework
Asim provides a modular and reusable framework for creating many models that helps break down the performance-modeling problem into individual pieces that can be modeled separately, while its reusability allows using a software component repeatedly in different contexts.
Tolerating memory latency through software-controlled pre-execution in simultaneous multithreading processors
- C. Luk
- Computer ScienceProceedings 28th Annual International Symposium…
- 1 June 2001
By using software to control pre-execution, this paper is able to handle some of the most important access patterns that are typically difficult to prefetch, and offers an average speedup of 24% in a set of irregular applications, which is a 19% speedup over state-of-the-art software-controlled prefetching.
SD3: A Scalable Approach to Dynamic Data-Dependence Profiling
This paper proposes a scalable approach to data-dependence profiling that addresses both runtime and memory overhead in a single framework, called SD3, and reduces the runtime overhead by parallelizing the dependence profiling step itself and compress memory accesses that exhibit stride patterns and compute data dependences directly in a compressed format.
Analyzing Parallel Programs with PIN
Pin is a software system that performs runtime binary instrumentation of Linux and Microsoft Windows applications and aims to provide an instrumentation platform for building a wide variety of program analysis tools, called pintools.
PinOS: a programmable framework for whole-system dynamic instrumentation
By inheriting the powerful instrumentation API from Pin, plus introducing some new API for system-level instrumentation, PinOS can be used to write system-wide instrumentation tools for tasks like program analysis and architectural studies.