Sriram Krishnamoorthy

Learn More
Endogenous mechanisms that act in the resolution of acute inflammation are essential for host defense and the return to homeostasis. Resolvin D1 (RvD1), biosynthesized during resolution, displays potent and stereoselective anti-inflammatory actions, such as limiting neutrophil infiltration and proresolving actions. Here, we demonstrate that RvD1 actions on(More)
Irregular and dynamic parallel applications pose significant challenges to achieving scalable performance on large-scale multicore clusters. These applications often require ongoing, dynamic load balancing in order to maintain efficiency. Scalable dynamic load balancing on large clusters is a challenging problem which can be addressed with distributed(More)
GPUs are a class of specialized parallel architectures with tremendous computational power. The new Compute Unified Device Architecture (CUDA) programming model from NVIDIA facilitates programming of general purpose applications on their GPUs. However, manual development of high-performance parallel code for GPUs is still very challenging. In this paper, a(More)
Performance optimization of stencil computations has been widely studied in the literature, since they occur in many computationally intensive scientific and engineering applications. Compiler frameworks have also been developed that can transform sequential stencil codes for optimization of data locality and parallelism. However, loop skewing is typically(More)
Several parallel architectures such as GPUs and the Cell processor have fast explicitly managed on-chip memories, in addition to slow off-chip memory. They also have very high computational power with multiple levels of parallelism. A significant challenge in programming these architectures is to effectively exploit the parallelism available in the(More)
This paper provides an overview of a program synthesis system for a class of quantum chemistry computations. These computations are expressible as a set of tensor contractions and arise in electronic structure modeling. The input to the system is a a high-level specification of the computation, from which the system can synthesize high-performance parallel(More)
The polyhedral model provides powerful abstractions to optimize loop nests with regular accesses. Affine transformations in this model capture a complex sequence of execution-reordering loop transformations that can improve performance by parallelization as well as locality enhancement. Although a significant body of research has addressed affine scheduling(More)
Resolution of acute inflammation is an active process that involves the biosynthesis of specialized proresolving lipid mediators. Among them, resolvin D1 (RvD1) actions are mediated by two G protein-coupled receptors (GPCRs), ALX/FPR2 and GPR32, that also regulate specific microRNAs (miRNAs) and their target genes in novel resolution circuits. We report the(More)
The computational power provided by many-core graphics processing units (GPUs) has been exploited in many applications. The programming techniques currently employed on these GPUs are not sufficient to address problems exhibiting irregular, and unbalanced workload. The problem is exacerbated when trying to effectively exploit multiple GPUs concurrently,(More)
Solving large, irregular graph problems efficiently is challenging. Current software systems and commodity multiprocessors do not support fine-grained, irregular parallelism well. We present XWS, the X10 work stealing framework, an open-source runtime for the parallel programming language X10 and a library to be used directly by application writers. XWS(More)