The promise of STM may likely be undermined by its overheads and workload applicabilities.
Calling context profiles are used in many inter-procedural code optimizations and in overall program understanding. Unfortunately, the collection of profile information is highly intrusive due to the high frequency of method calls in most applications. Previously proposed calling-context profiling mechanisms consequently suffer from either low accuracy,… (More)
On the twentieth anniversary of the original publication , following ten years of intense activity in the research literature, hardware support for transactional memory (TM) has finally become a commercial reality, with HTM-enabled chips currently or soon-to-be available from many hardware vendors. In this paper we describe architectural support for TM… (More)
The use of the Java programming language for implementing server-side application logic is increasing in popularity , yet there is very little known about the architectural requirements of this emerging commercial workload. We present a detailed characterization of the Transaction Processing Council's TPC-W web benchmark, implemented in Java. The TPC-W… (More)
Precise and accurate simulation of processors and computer systems is a painstaking, time-consuming, and error-prone task. Abstraction and simplification are powerful tools for reducing the cost and complexity of simulation, but are known to reduce precision. Similarly, limiting and simplifying the workloads that are used to drive simulation can simplify… (More)
Conventional out-of-order processors employ a multi-ported,fully-associative load queue to guarantee correctmemory reference order both within a single thread of executionand across threads in a multiprocessor system. Asimprovements in process technology and pipelining lead tohigher clock frequencies, scaling this complex structure toaccommodate a larger… (More)
We introduce a new technique for automated performance diagnosis , using the program's callgraph. We discuss our implementation of this diagnosis technique in the Paradyn Performance Consultant. Our implementation includes the new search strategy and new dynamic instrumentation to resolve pointer-based dynamic call sites at run-time. We compare the… (More)
This paper explores the interaction of value prediction with thread-level parallelism techniques, including multithreading and multiprocessing, where correctness is defined by a memory consistency model. Value prediction subtly interacts with the memory consistency model by allowing data dependent instructions to be reordered. We find that predicting a… (More)
SUMMARY Software transactional memory (STM) systems are an attractive environment to evaluate optimistic concurrency. We describe our experience of supporting and optimizing an STM system at both the managed runtime and compiler levels. We describe the design policies of our STM system, and the statistics collected by the runtime to identify performance… (More)
We present an algorithm for dynamically verifying that the execution of a multithreaded program is sequentially consistent. The algorithm uses a vector-timestamp logical time mechanism to construct and verify the acyclic nature of an execution's constraint graph.