#### Filter Results:

- Full text PDF available (24)

#### Publication Year

2007

2017

- This year (3)
- Last 5 years (24)
- Last 10 years (34)

#### Publication Type

#### Co-author

#### Journals and Conferences

#### Key Phrases

Learn More

- Rio Yokota, Jaydeep P. Bardhan, Matthew G. Knepley, Lorena A. Barba, Tsuyoshi Hamada
- Computer Physics Communications
- 2011

We present teraflop-scale calculations of biomolecular electrostatics enabled by the combination of algorithmic and hardware acceleration. The algorithmic acceleration is achieved with the fast multipole method (fmm) in conjunction with a boundary element method (bem) formulation of the continuum electrostatic model, as well as the bibee approximation to… (More)

- Rio Yokota, Tetsu Narumi, Ryuji Sakamaki, Shun Kameoka, Shinnosuke Obi, Kenji Yasuoka
- Computer Physics Communications
- 2009

- Rio Yokota
- ArXiv
- 2012

The present work attempts to integrate the independent efforts in the fast N-body community to create the fastest N-body library for many-core and heterogenous architectures. Focus is placed on low accuracy optimizations, in response to the recent interest to use FMM as a preconditioner for sparse linear solvers. A direct comparison with other… (More)

- Tsuyoshi Hamada, Tetsu Narumi, Rio Yokota, Kenji Yasuoka, Keigo Nitadori, Makoto Taiji
- Proceedings of the Conference on High Performance…
- 2009

As an entry for the 2009 Gordon Bell price/performance prize, we present the results of two different hierarchical <i>N</i>-body simulations on a cluster of 256 graphics processing units (GPUs). Unlike many previous <i>N</i>-body simulations on GPUs that scale as <i>O</i>(<i>N</i><sup>2</sup>), the present method calculates the <i>O</i>(<i>N</i> log… (More)

- Rio Yokota, Lorena A. Barba
- IJHPCA
- 2012

Among the algorithms that are likely to play a major role in future exascale computing, the fast multipole method (FMM) appears as a rising star. Our previous recent work showed scaling of an FMM on GPU clusters, with problem sizes in the order of billions of unknowns. That work led to an extremely parallel FMM, scaling to thousands of GPUs or tens of… (More)

- Hatem Ltaief, Rio Yokota
- Concurrency and Computation: Practice and…
- 2014

Fast multipole methods have O(N) complexity, are compute bound, and require very little synchronization, which makes them a favorable algorithm on next-generation supercomputers. Their most common application is to accelerate N -body problems, but they can also be used to solve boundary integral equations. When the particle distribution is irregular and the… (More)

- Rio Yokota, Lorena A. Barba, Matthew G. Knepley
- ArXiv
- 2009

We have developed a parallel algorithm for radial basis function (rbf) interpolation that exhibits O(N) complexity, requires O(N) storage, and scales excellently up to a thousand processes. The algorithm uses a gmres iterative solver with a restricted additive Schwarz method (rasm) as a preconditioner and a fast matrix-vector algorithm. Previous fast rbf… (More)

- Kenjiro Taura, Jun Nakashima, Rio Yokota, Naoya Maruyama
- 2012 SC Companion: High Performance Computing…
- 2012

This paper describes a task parallel implementation of ExaFMM, an open source implementation of fast multipole methods (FMM), using a lightweight task parallel library MassiveThreads. Although there have been many attempts on parallelizing FMM, experiences have almost exclusively been limited to formulation based on flat homogeneous parallel loops. FMM in… (More)

- Rio Yokota, Lorena A. Barba
- Computing in Science & Engineering
- 2012

Algorithms designed to efficiently solve the classical N-body problem of mechanics fit well on GPU hardware and exhibit excellent scalability on many GPUs. Their computational intensity makes them a promising approach for other applications amenable to an N-body formulation. Adding features such as autotuning makes multipole-type algorithms ideal for… (More)

Extracting maximum performance of multi-core architectures is a difficult task primarily due to bandwidth limitations of the memory subsystem and its complex hierarchy. In this work, we study the implications of fork-join and data-driven execution models on this type of architecture at the level of task parallelism. For this purpose, we use a highly… (More)