Learn More
Irregular and dynamic parallel applications pose significant challenges to achieving scalable performance on large-scale multicore clusters. These applications often require ongoing, dynamic load balancing in order to maintain efficiency. Scalable dynamic load balancing on large clusters is a challenging problem which can be addressed with distributed(More)
GPUs are a class of specialized parallel architectures with tremendous computational power. The new Compute Unified Device Architecture (CUDA) programming model from NVIDIA facilitates programming of general purpose applications on their GPUs. However, manual development of high-performance parallel code for GPUs is still very challenging. In this paper, a(More)
This paper provides an overview of a program synthesis system for a class of quantum chemistry computations. These computations are expressible as a set of tensor contractions and arise in electronic structure modeling. The input to the system is a a high-level specification of the computation, from which the system can synthesize high-performance parallel(More)
Solving large, irregular graph problems efficiently is challenging. Current software systems and commodity multiprocessors do not support fine-grained, irregular parallelism well. We present XWS, the X10 work stealing framework, an open-source runtime for the parallel programming language X10 and a library to be used directly by application writers. XWS(More)
The polyhedral model provides powerful abstractions to optimize loop nests with regular accesses. Affine transformations in this model capture a complex sequence of execution-reordering loop transformations that can improve performance by parallelization as well as locality enhancement. Although a significant body of research has addressed affine scheduling(More)
Performance optimization of stencil computations has been widely studied in the literature, since they occur in many computationally intensive scientific and engineering applications. Compiler frameworks have also been developed that can transform sequential stencil codes for optimization of data locality and parallelism. However, loop skewing is typically(More)
Resolution of acute inflammation is an active process that involves the biosynthesis of specialized proresolving lipid mediators. Among them, resolvin D1 (RvD1) actions are mediated by two G protein-coupled receptors (GPCRs), ALX/FPR2 and GPR32, that also regulate specific microRNAs (miRNAs) and their target genes in novel resolution circuits. We report the(More)
The computational power provided by many-core graphics processing units (GPUs) has been exploited in many applications. The programming techniques currently employed on these GPUs are not sufficient to address problems exhibiting irregular, and unbalanced workload. The problem is exacerbated when trying to effectively exploit multiple GPUs concurrently,(More)
We present here a report produced by a workshop on " Addressing Failures in Exascale Computing " held in Park City, Utah, August 4–11, 2012. The charter of this workshop was to establish a common taxonomy about resilience across all the levels in a computing system; discuss existing knowledge on resilience across the various hardware and software layers of(More)