#### Filter Results:

- Full text PDF available (24)

#### Publication Year

2007

2017

- This year (1)
- Last 5 years (22)
- Last 10 years (32)

#### Publication Type

#### Co-author

#### Publication Venue

#### Key Phrases

Learn More

- Tsuyoshi Hamada, Tetsu Narumi, Rio Yokota, Kenji Yasuoka, Keigo Nitadori, Makoto Taiji
- Proceedings of the Conference on High Performance…
- 2009

As an entry for the 2009 Gordon Bell price/performance prize, we present the results of two different hierarchical <i>N</i>-body simulations on a cluster of 256 graphics processing units (GPUs). Unlike many previous <i>N</i>-body simulations on GPUs that scale as <i>O</i>(<i>N</i><sup>2</sup>), the present method calculates the <i>O</i>(<i>N</i> log… (More)

- Hatem Ltaief, Rio Yokota
- Concurrency and Computation: Practice and…
- 2014

Fast multipole methods have O(N) complexity, are compute bound, and require very little synchronization, which makes them a favorable algorithm on next-generation super-computers. Their most common application is to accelerate N-body problems, but they can also be used to solve boundary integral equations. When the particle distribution is irregular and the… (More)

- Rio Yokota, Lorena A. Barba, Matthew G. Knepley
- ArXiv
- 2009

We have developed a parallel algorithm for radial basis function (rbf) interpolation that exhibits O(N) complexity, requires O(N) storage, and scales excellently up to a thousand processes. The algorithm uses a gmres iterative solver with a restricted additive Schwarz method (rasm) as a preconditioner and a fast matrix-vector algorithm. Previous fast rbf… (More)

Extracting maximum performance of multi-core architectures is a difficult task primarily due to bandwidth limitations of the memory subsystem and its complex hierarchy. In this work, we study the implications of fork-join and data-driven execution models on this type of architecture at the level of task parallelism. For this purpose, we use a highly… (More)

- Rio Yokota, Jaydeep P. Bardhan, Matthew G. Knepley, Lorena A. Barba, Tsuyoshi Hamada
- Computer Physics Communications
- 2011

We present teraflop-scale calculations of biomolecular electrostatics enabled by the combination of algorithmic and hardware acceleration. The algorithmic acceleration is achieved with the fast multipole method (fmm) in conjunction with a boundary element method (bem) formulation of the continuum electrostatic model, as well as the bibee approximation to… (More)

- Rio Yokota, Tetsu Narumi, Ryuji Sakamaki, Shun Kameoka, Shinnosuke Obi, Kenji Yasuoka
- Computer Physics Communications
- 2009

- Rio Yokota, Lorena A. Barba
- Computing in Science & Engineering
- 2012

Algorithms designed to efficiently solve the classical N-body problem of mechanics fit well on GPU hardware and exhibit excellent scalability on many GPUs. Their computational intensity makes them a promising approach for other applications amenable to an N-body formulation. Adding features such as autotuning makes multipole-type algorithms ideal for… (More)

- Rio Yokota, Lorena A. Barba
- IJHPCA
- 2012

Among the algorithms that are likely to play a major role in future exascale computing, the fast multipole method (FMM) appears as a rising star. Our previous recent work showed scaling of an FMM on GPU clusters, with problem sizes in the order of billions of unknowns. That work led to an extremely parallel FMM, scaling to thousands of GPUs or tens of… (More)

- Rio Yokota, Tetsu Narumi, Lorena A. Barba, Kenji Yasuoka
- ArXiv
- 2011

- Rio Yokota, Lorena A. Barba, Tetsu Narumi, Kenji Yasuoka
- Computer Physics Communications
- 2013

—We present a 0.5 Petaflop/s calculation of homogeneous isotropic turbulence in a cube of 2048 3 particles, using a highly parallel fast multipole method (FMM) using 2048 GPUs on the TSUBAME 2.0 system. We compare this particle-based code with a spectral DNS code under the same calculation condition and the same machine. The results of our particle-based… (More)