Barbara Kreaseck

Learn More
Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE. Abstract In this paper we(More)
In modern computers, a program's data locality can affect performance significantly. This paper details full sparse til-ing, a run-time reordering transformation that improves the data locality for stationary iterative methods such as Gauss–Seidel operating on sparse matrices. In scientific applications such as finite element analysis, these iterative(More)
Finite Element problems are often solved using multi-grid techniques. The most time consuming part of multi-grid is the iterative smoother, such as Gauss-Seidel. To improve performance, iterative smoothers can exploit paral-lelism, intra-iteration data reuse, and inter-iteration data reuse. Current methods for parallelizing Gauss-Seidel on irregular grids,(More)
Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE. Abstract Overlapping communication(More)
Applications that manipulate sparse data structures contain memory reference patterns that are unknown at compile time due to indirect accesses such as A[B[i]]. To exploit parallelism and improve locality in such applications, prior work has developed a number of run-time reordering transformations (RTRTs). This paper presents the Sparse Polyhedral(More)
In forward mode Automatic Differentiation, the derivative program computes a function f and its derivatives, f. Activity analysis is important for AD. Our results show that when all variables are active, the runtime checks required for dynamic activity analysis incur a significant overhead. However, when as few as half of the input variables are inactive,(More)
Overlapping communication with computation is a well-known technique to increase application performance. While it is commonly assumed that communication and computation can be overlapped at no cost, in reality they interfere with each other. In this paper we empirically evaluate the interference rate of communication on computation via measurements on a(More)
Tomorrow's microprocessors will be able to handle multiple flows of control. Applications that exhibit task level parallelism (TLP) and can be decomposed into parallel tasks will perform well on these platforms. TLP arises when a task is independent of its neighboring code. Traditional parallel compilers exploit one variety of TLP, loop level parallelism(More)
  • 1