Learn More
We present new scalable algorithms and a new implementation of our kernel-independent fast multipole method (Ying et al. ACM/IEEE SC '03), in which we employ both distributed memory parallelism (via MPI) and shared memory/streaming parallelism (via GPU acceleration) to rapidly evaluate two-body non-oscillatory potentials. On traditional CPU-only systems,(More)
We describe our software package Block Locally Optimal Preconditioned Eigenvalue Xolvers (BLOPEX) publicly released recently. BLOPEX is available as a stand-alone serial library, as an external package to PETSc (" Portable, Extensible Toolkit for Scientific Computation " , a general purpose suite of tools for the scalable solution of partial differential(More)
We present a fast, petaflop-scalable algorithm for Stokesian particulate flows. Our goal is the direct simulation of blood, which we model as a mixture of a Stokesian fluid (plasma) and red blood cells (RBCs). Directly simulating blood is a challenging multiscale, multiphysics problem. We report simulations with up to 200 million deformable RBCs. The(More)
This work presents the first extensive study of single-node performance optimization, tuning, and analysis of the fast multipole method (FMM) on modern multi-core systems. We consider single- and double-precision with numerous performance enhancements, including low-level tuning, numerical approximation, data structure transformations, OpenMP(More)
In this article, we present Dendro, a suite of parallel algorithms for the discretization and solution of partial differential equations that require discretization of second-order elliptic operators. Dendro uses trilinear finite element discretizations constructed using octrees. Dendro, which is built on top of PETSc (Argonne National Laboratories),(More)
We analyze the conjugate gradient (CG) method with variable preconditioning for solving a linear system with a real symmetric positive definite (SPD) matrix of coefficients A. We assume that the preconditioner is SPD on each step, and that the condition number of the precon-ditioned system matrix is bounded above by a constant independent of the step(More)
We present preliminary results of an ongoing project to develop codes of the Locally Optimal Block Preconditioned Conjugate Gradient (LOBPCG) method for symmetric eigenvalue problems for hypre and PETSc software packages. hypre and PETSc provide high quality domain decomposition and multigrid preconditioning for parallel computers. Our LOBPCG implementation(More)