Jan Christian Meyer

Learn More
Demand-based dependence graphs (DDGs), such as the (Regionalized) Value State Dependence Graph ((R)VSDG), are intermediate representations (IRs) well suited for a wide range of program transformations. They explicitly model the flow of data and state, and only implicitly represent a restricted form of control flow. These features make DDGs especially(More)
A new approach for collecting data from many hundreds of thousands of microcrystals using X-ray pulses from a free-electron laser has recently been developed. Referred to as serial crystallography, diffraction patterns are recorded at a constant rate as a suspension of protein crystals flows across the path of an X-ray beam. Events that by chance contain(More)
Driven by the utilization wall and the Dark Silicon effect, energy efficiency has become a key research area in microprocessor design. Vectorization, parallelization, specialization and heterogeneity are the key design points to deal with the utilization wall. Heterogeneous architectures are enhanced with architectural optimizations, such as vectorization,(More)
Fast bit-reversal algorithms have been of strong interest for many decades, especially after Cooley and Tukey introduced their FFT implementation in 1965. Many recent algorithms, including FFTW try to avoid the bit-reversal all together by doing in-place algorithms within their FFTs. We therefore motivate our work by showing that for FFTs of up to 65.536(More)
In 2006, John Mellor-Crummey and Michael Scott received the Dijkstra Prize in distributed computing for their 1991 paper on algorithms for scalable synchronization on shared memory multiprocessors, which included a novel spin-lock algorithm (a.k.a. MCS spin-lock) that carefully distributes spin locations in memory to lessen the impact of bandwidth(More)
In this paper, we apply a method for extracting a running power estimate of applications from hardware performance counters, producing power/time curves which can be integrated over particular intervals to estimate the energy consumption of individual application stages. We use this method to instrument executions of a conjugate gradient solver, to examine(More)
While Moore’s law states that the number of transistors is approximately doubled every 2 years, powering these transistors simultaneously is only possible as long as Dennard scaling continues. Unfortunately, voltage scaling has slowed down in recent years, and microprocessor designers have hit what is known as the “utilization wall” or the “dark silicon”(More)
The heterogeneous communication characteristics of clustered SMP systems create great potential for optimizations which favor physical locality. This paper describes a novel technique for automating such optimizations, applied to barrier operations. Portability poses a challenge when optimizing for locality, as costs are bound to variations in platform(More)
Predicting how well applications may run on modern systems is becoming increasingly challenging. It is no longer sufficient to look at number of floating point operations and communication costs, but one also needs to model the underlying systems and how their topology, heterogeneity, system loads, etc, may impact performance. This work focuses on(More)