Jisheng Zhao

Learn More
Existing dynamic race detectors suffer from at least one of the following three limitations: (i)<i>space</i> overhead per memory location grows linearly with the number of parallel threads [13], severely limiting the parallelism that the algorithm can handle; (ii)<i>sequentialization</i>: the parallel program must be processed in a sequential order,(More)
Modern computer systems feature multiple homogeneous or heterogeneous computing units with deep memory hierarchies, and expect a high degree of thread-level parallelism from the software. Exploitation of data locality is critical to achieving scalable parallelism, but adds a significant dimension of complexity to performance optimization of parallel(More)
Accelerating program performance via SIMD vector units is very common in modern processors, as evidenced by the use of SSE, MMX, VSE, and VSX SIMD instructions in multimedia, scientific, and embedded applications. To take full advantage of the vector capabilities, a compiler needs to generate efficient vector code automatically. However, most commercial and(More)
An Intermediate Language (IL) specifies a program at a level of abstraction that includes precise semantics for state updates and control flow, but leaves unspecified the low-level software and hardware mechanisms that will be used to implement the semantics. Past ILs have followed the Von Neumann execution model by making sequential execution the default,(More)
Thymic stromal lymphopoietin (TSLP) is said to increase expression of chemokines attracting Th2 T cells. We hypothesized that asthma is characterized by elevated bronchial mucosal expression of TSLP and Th2-attracting, but not Th1-attracting, chemokines as compared with controls, with selective accumulation of cells bearing receptors for these chemokines.(More)
A major productivity hurdle for parallel programming is the presence of data races. Data races can lead to all kinds of harmful program behaviors, including determinism violations and corrupted memory. However, runtime overheads of current dynamic data race detectors are still prohibitively large (often incurring slowdowns of 10× or more) for use in(More)
Type systems that prevent data races are a powerful tool for parallel programming, eliminating whole classes of bugs that are both hard to find and hard to fix. Unfortunately, it is difficult to apply previous such type systems to “real” programs, as each of them are designed around a specific synchronization primitive or parallel pattern, such as locks or(More)
The initial wave of programming models for general-purpose computing on GPUs, led by CUDA and OpenCL, has provided experts with low-level constructs to obtain significant performance and energy improvements on GPUs. However, these programming models are characterized by a challenging learning curve for non-experts due to their complex and low-level APIs.(More)
Increasing the number of instructions executing in parallel has helped improve processor performance, but the technique is limited. Executing code on parallel threads and processors has fewer limitations, but most computer programs tend to be serial in nature. This paper presents a compiler optimisation that at run-time parallelises code inside a JVM and(More)