#### Filter Results:

#### Publication Year

2007

2016

#### Publication Type

#### Co-author

#### Publication Venue

#### Key Phrases

Learn More

As an entry for the 2009 Gordon Bell price/performance prize, we present the results of two different hierarchical <i>N</i>-body simulations on a cluster of 256 graphics processing units (GPUs). Unlike many previous <i>N</i>-body simulations on GPUs that scale as <i>O</i>(<i>N</i><sup>2</sup>), the present method calculates the <i>O</i>(<i>N</i> log… (More)

Fast multipole methods have O(N) complexity, are compute bound, and require very little synchronization, which makes them a favorable algorithm on next-generation super-computers. Their most common application is to accelerate N-body problems, but they can also be used to solve boundary integral equations. When the particle distribution is irregular and the… (More)

We have developed a parallel algorithm for radial basis function (rbf) interpolation that exhibits O(N) complexity, requires O(N) storage, and scales excellently up to a thousand processes. The algorithm uses a gmres iterative solver with a restricted additive Schwarz method (rasm) as a preconditioner and a fast matrix-vector algorithm. Previous fast rbf… (More)

We present teraflop-scale calculations of biomolecular electrostatics enabled by the combination of algorithmic and hardware acceleration. The algorithmic acceleration is achieved with the fast multipole method (fmm) in conjunction with a boundary element method (bem) formulation of the continuum electrostatic model, as well as the bibee approximation to… (More)

Among the algorithms that are likely to play a major role in future exascale computing, the fast multipole method (FMM) appears as a rising star. Our previous recent work showed scaling of an FMM on GPU clusters, with problem sizes in the order of billions of unknowns. That work led to an extremely parallel FMM, scaling to thousands of GPUs or tens of… (More)

Algorithms designed to efficiently solve this classical problem of physics fit very well on GPU hardware, and exhibit excellent scalability on many GPUs. Their computational intensity makes them a promising approach for many other applications amenable to an N-body formulation. Adding features such as auto-tuning makes multipole-type algorithms ideal for… (More)

—We present a 0.5 Petaflop/s calculation of homogeneous isotropic turbulence in a cube of 2048 3 particles, using a highly parallel fast multipole method (FMM) using 2048 GPUs on the TSUBAME 2.0 system. We compare this particle-based code with a spectral DNS code under the same calculation condition and the same machine. The results of our particle-based… (More)

- Rio Yokota
- ArXiv
- 2012

The present work attempts to integrate the independent efforts in the fast N-body community to create the fastest N-body library for many-core and heterogenous architectures. Focus is placed on low accuracy optimizations, in response to the recent interest to use FMM as a preconditioner for sparse linear solvers. A direct comparison with other… (More)