Christopher J. Hughes

Learn More
Intel has recently introduced Intel R Transactional Synchronization Extensions (Intel R TSX) in the Intel 4th Generation Core TM Processors. With Intel TSX, a processor can dynamically determine whether threads need to serialize through lock-protected critical sections. In this paper, we evaluate the first hardware implementation of Intel TSX using a set of(More)
General-purpose processors are expected to be increasingly employed for multimedia workloads on systems where reducing energy consumption is an important goal. Researchers have proposed the use of two forms of hardware adaptation - architectural adaptation and dynamic voltage (and frequency) scaling or DVS - to reduce energy. This paper develops and(More)
Chip multiprocessors (CMPs) are now commonplace, and the number of cores on a CMP is likely to grow steadily. However, in order to harness the additional compute resources of a CMP, applications must expose their thread-level parallelism to the hardware. One common approach to doing this is to decompose a program into parallel "tasks" and allow an(More)
Simultaneous multithreading (SMT) improves processor throughput by processing instructions from multiple threads each cycle. This is the first work to explore soft real-time scheduling on an SMT processor. Scheduling with SMT requires two decisions: (1) which threads to run simultaneously (the co-schedule), and (2) how to share processor resources among(More)
The purpose of this study was to compare shoulder range-of-motion (ROM) and strength values between bodybuilders and nonbodybuilders. Fifty-four men (29 bodybuilders and 25 nonbodybuilders) between the ages of 21 and 34 years participated in the study. Goniometric measurements were used to assess shoulder flexion and internal and external rotation ROM.(More)
<i>This paper explores Speculative Precomputation, a technique that uses idle thread context in a multithreaded architecture to improve performance of single-threaded applications. It attacks program stalls from data cache misses by pre-computing future memory accesses in available thread contexts, and prefetching these data. This technique is evaluated by(More)
High performance parallel programs are currently difficult to write and debug. One major source of difficulty is protecting concurrent accesses to shared data with an appropriate synchronization mechanism. Locks are the most common mechanism but they have a number of disadvantages, including possibly unnecessary serialization, and possible deadlock.(More)
<i>Multimedia applications are an increasingly important workload for general-purpose processors. This paper analyzes frame-level execution time variability for several multimedia applications on general-purpose architectures. There are two reasons for such an analysis. First, it has been conjectured that complex features of such architectures (e.g.,(More)