Learn More
As an entry for the 2009 Gordon Bell price/performance prize, we present the results of two different hierarchical <i>N</i>-body simulations on a cluster of 256 graphics processing units (GPUs). Unlike many previous <i>N</i>-body simulations on GPUs that scale as <i>O</i>(<i>N</i><sup>2</sup>), the present method calculates the <i>O</i>(<i>N</i> log(More)
Fast multipole methods have O(N) complexity, are compute bound, and require very little synchronization, which makes them a favorable algorithm on next-generation super-computers. Their most common application is to accelerate N-body problems, but they can also be used to solve boundary integral equations. When the particle distribution is irregular and the(More)
We have developed a parallel algorithm for radial basis function (rbf) interpolation that exhibits O(N) complexity, requires O(N) storage, and scales excellently up to a thousand processes. The algorithm uses a gmres iterative solver with a restricted additive Schwarz method (rasm) as a preconditioner and a fast matrix-vector algorithm. Previous fast rbf(More)
We present teraflop-scale calculations of biomolecular electrostatics enabled by the combination of algorithmic and hardware acceleration. The algorithmic acceleration is achieved with the fast multipole method (fmm) in conjunction with a boundary element method (bem) formulation of the continuum electrostatic model, as well as the bibee approximation to(More)
Among the algorithms that are likely to play a major role in future exascale computing, the fast multipole method (FMM) appears as a rising star. Our previous recent work showed scaling of an FMM on GPU clusters, with problem sizes in the order of billions of unknowns. That work led to an extremely parallel FMM, scaling to thousands of GPUs or tens of(More)
Algorithms designed to efficiently solve this classical problem of physics fit very well on GPU hardware, and exhibit excellent scalability on many GPUs. Their computational intensity makes them a promising approach for many other applications amenable to an N-body formulation. Adding features such as auto-tuning makes multipole-type algorithms ideal for(More)
—We present a 0.5 Petaflop/s calculation of homogeneous isotropic turbulence in a cube of 2048 3 particles, using a highly parallel fast multipole method (FMM) using 2048 GPUs on the TSUBAME 2.0 system. We compare this particle-based code with a spectral DNS code under the same calculation condition and the same machine. The results of our particle-based(More)