Prakash Prabhu

Learn More
The performance benefits of GPU parallelism can be enormous, but unlocking this performance potential is challenging. The applicability and performance of GPU parallelizations is limited by the complexities of CPU-GPU communication. To address these communications problems, this paper presents the first fully automatic system for managing and optimizing(More)
GPUs are flexible parallel processors capable of accelerating real applications. To exploit them, programmers must ensure a consistent program state between the CPU and GPU memories by managing data. Manually managing data is tedious and error-prone. In prior work on automatic CPU-GPU data management, alias analysis quality limits performance, and(More)
Sequential programming models express a total program order, of which a partial order must be respected. This inhibits parallelizing tools from extracting scalable performance. Programmer written semantic commutativity assertions provide a natural way of relaxing this partial order, thereby exposing parallelism implicitly in a program. Existing implicit(More)
Automatic parallelization is a promising strategy to improve application performance in the multicore era. However, common programming practices such as the reuse of data structures introduce artificial constraints that obstruct automatic parallelization. Privatization relieves these constraints by replicating data structures, thus enabling scalable(More)
Computing plays an indispensable role in scientific research. Presently, researchers in science have different problems, needs, and beliefs about computation than professional programmers. In order to accelerate the progress of science, computer scientists must understand these problems, needs, and beliefs. To this end, this paper presents a survey of(More)
C++ Exceptions provide a useful way for dealing with abnormal program behavior, but often lead to irregular interprocedural control flow that complicates compiler optimizations and static analysis. In this paper, we present an interprocedural exception analysis and transformation framework for C++ that captures the control-flow induced by exceptions and(More)
Context sensitive pointer analyses based on Whaley and Lam’s bddbddb system have been shown to scale to large Java programs. We provide a technique to incorporate flow sensitivity for Java fields into one such analysis and obtain an escape analysis based on it. First, we express an intraprocedural field flow sensitive analysis, using Fink et al.’s Heap(More)
Automatic parallelization has shown promise in producing scalable multi-threaded programs for multi-core architectures. Most existing automatic techniques parallelize independent loops and insert global synchronization between loop invocations. For programs with many loop invocations, frequent synchronization often becomes the performance bottleneck. Some(More)