#### Filter Results:

- Full text PDF available (14)

#### Publication Year

2011

2016

- This year (0)
- Last 5 years (17)
- Last 10 years (18)

#### Publication Type

#### Co-author

#### Journals and Conferences

#### Key Phrases

Learn More

- Jack Poulson, Bryan Marker, Robert A. van de Geijn, Jeff R. Hammond, Nichols A. Romero
- ACM Trans. Math. Softw.
- 2013

Parallelizing dense matrix computations to distributed memory architectures is a well-studied subject and generally considered to be among the best understood domains of parallel computing. Two packages, developed in the mid 1990s, still enjoy regular use: ScaLAPACK and PLAPACK. With the advent of many-core architectures, which may very well take the shape… (More)

- Bryan Marker, Jack Poulson, Don S. Batory, Robert A. van de Geijn
- VECPAR
- 2012

To implement dense linear algebra algorithms for distributed-memory computers, an expert applies knowledge of the domain, the target architecture, and how to parallelize common operations. This is often a rote process that becomes tedious for a large collection of algorithms. We have developed a way to encode this expert knowledge such that it can be… (More)

- Laurent Demanet, Matthew Ferrara, Nicholas Maxwell, Jack Poulson, Lexing Ying
- SIAM J. Imaging Sciences
- 2012

In spite of an extensive literature on fast algorithms for synthetic aperture radar (SAR) imaging, it is not currently known if it is possible to accurately form an image from N data points in provable near-linear time complexity. This paper seeks to close this gap by proposing an algorithm which runs in complexity O(N logN log(1/ )) without making the… (More)

We describe an extension of the Scalable Universal Matrix Multiplication Algorithms (SUMMA) from 2D to 3D process grids; the underlying idea is to lower the communication volume through storing redundant copies of one or more matrices. While SUMMA was originally introduced for block-wise matrix distributions, so that most of its communication was within… (More)

- Jack Poulson, Björn Engquist, Siwei Li, Lexing Ying
- SIAM J. Scientific Computing
- 2013

A parallelization of a sweeping preconditioner for 3D Helmholtz equations without large cavities is introduced and benchmarked for several challenging velocity models. The setup and application costs of the sequential preconditioner are shown to be O(γN) and O(γN logN), where γ(ω) denotes the modestly frequency-dependent number of grid points per Perfectly… (More)

- Jack Poulson, Laurent Demanet, Nicholas Maxwell, Lexing Ying
- SIAM J. Scientific Computing
- 2014

The butterfly algorithm is a fast algorithm which approximately evaluates a discrete analogue of the integral transform ∫ Rd K(x, y)g(y)dy at large numbers of target points when the kernel, K(x, y), is approximately low-rank when restricted to subdomains satisfying a certain simple geometric condition. In d dimensions with O(N) source and target points,… (More)

We present a parallel preconditioning method for the iterative solution of the time-harmonic elastic wave equation which makes use of higher-order spectral elements to reduce pollution error. In particular, the method leverages perfectly matched layer boundary conditions to efficiently approximate the Schur complement matrices of a block LDL factorization.… (More)

- Jack Poulson, Björn Engquist, Sergey Fomel, Siwei Li, Lexing Ying
- ArXiv
- 2012

- Bryan Marker, Ernie Chan, +4 authors Theodore E. Kubaska
- Concurrency and Computation: Practice and…
- 2012

- Bryan Marker, Ernie Chan, +4 authors Theodore E. Kubaska
- 2011

A message passing, distributed-memory parallel computer on a chip is one possible design for future, many-core architectures. We discuss initial experiences with the Intel Single-chip Cloud Computer research processor, which is a prototype architecture that incorporates 48 cores on a single die that can communicate via a small, shared, on-die buffer. The… (More)