# A novel multiple-walk parallel algorithm for the Barnes–Hut treecode on GPUs – towards cost effective, high performance N-body simulation

@article{Hamada2009A, title={ A novel multiple-walk parallel algorithm for the Barnes–Hut treecode on GPUs – towards cost effective, high performance N-body simulation }, author={Tsuyoshi Hamada and Keigo Nitadori and Khaled Benkrid and Yousuke Ohno and Gentaro Morimoto and Tomonari Masada and Yuichiro Shibata and Kiyoshi Oguri and Makoto Taiji}, journal={Computer Science - Research and Development}, year={2009}, volume={24}, pages={21-31} }

AbstractRecently, general-purpose computation on graphics processing units (GPGPU) has become an increasingly
popular field of study as graphics processing units (GPUs) continue to be proposed as high performance
and relatively low cost implementation platforms for scientific computing applications. Among these
applications figure astrophysical N-bodysimulations, which form one of the most challenging problems
in computational science. However, in most reported studies, a simple
$ \mathcal…

## 30 Citations

### Barnes-hut treecode on GPU

- Computer Science2010 IEEE International Conference on Progress in Informatics and Computing
- 2010

A new implementation of tree-algorithm on GPU using CUDA, which has obtained more than 100X speedup when computing forces between bodies, and rises up a new method to build tree in this algorithm, making the performance even better.

### A sparse octree gravitational N-body code that runs entirely on the GPU processor

- Computer ScienceJ. Comput. Phys.
- 2012

### An Efficient CUDA Implementation of the Tree-Based Barnes Hut n-Body Algorithm

- Computer Science
- 2011

### 42 TFlops hierarchical N-body simulations on GPUs with applications in both astrophysics and turbulence

- Physics, Computer ScienceProceedings of the Conference on High Performance Computing Networking, Storage and Analysis
- 2009

The present method calculates the O(N log N) treecode and O (N) fast multipole method (FMM) on the GPUs with unprecedented efficiency and demonstrates the performance of the method by choosing one standard application -a gravitational N-body simulation- and one non-standard application -simulation of turbulence using vortex particles.

### A novel parallel algorithm for near-field computation in N-body problem on GPU

- Computer Science
- 2011

A novel efficient parallel algorithm for the near-field computation in N-body problem on the Graphics Processing Unit (GPU) architecture is proposed, based on the Newton’s third law and Z-order Space Filling Curve.

### The algorithm mapping of the near-field computation in N-body problem on GPU

- Computer Science
- 2011

This paper discusses the principle of mapping algorithm efficiently on to the Graphics Processing Unit (GPU) architecture from the aspects of task partition and data access by researching the…

### 190 TFlops Astrophysical N-body Simulation on a Cluster of GPUs

- Computer Science, Physics2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
- 2010

The results of a hierarchical N-body simulation on DEGIMA, a cluster of PCs with 576 graphic processing units (GPUs) and using an InfiniBand interconnect using Infini band are presented.

### On the parallelization and performance analysis of Barnes–Hut algorithm using Java parallel platforms

- Computer ScienceSN Applied Sciences
- 2020

Multi-core processors provide time-efficient and cost-effective solutions to execute the algorithms for complex physical systems. However, to efficiently exploit the processing capabilities of the…

### Parallel time-space processing model based fast N-body simulation on GPUs

- Computer Science, PhysicsPMAM '13
- 2013

A novel parallel implementation of N-body gravitational simulation on GPUs is presented, and the experimental results show that this method achieves an acceleration of 413 compared with CPU, and an acceleration up to 5.5 times compared with other GPU based methods.

### Directionally unsplit hydrodynamic schemes with hybrid MPI/OpenMP/GPU parallelization in AMR

- Computer ScienceInt. J. High Perform. Comput. Appl.
- 2012

A hybrid MPI/OpenMP model is investigated, which enables the full exploitation of the computing power in a heterogeneous CPU/GPU cluster and significantly improves the overall performance.

## References

SHOWING 1-10 OF 30 REFERENCES

### High Performance Direct Gravitational N-body Simulations on Graphics Processing Units

- Computer ScienceArXiv
- 2007

### GPGPU: general-purpose computation on graphics hardware

- Computer ScienceSC
- 2006

The graphics processor (GPU) on today's commodity video cards has evolved into an extremely powerful and flexible processor. Modern graphics architectures provide tremendous memory bandwidth and…

### The Chamomile Scheme: An Optimized Algorithm for N-body simulations on Programmable Graphics Processing Units

- Computer Science
- 2007

An algorithm named "Chamomile Scheme" is presented, fully optimized for calculating gravitational interactions on the latest programmable Graphics Processing Unit (GPU), NVIDIA GeForce8800GTX, which has small but fast shared memories and floating point arithmetic hardware but only for single precision.

### Performance Tuning of N-Body Codes on Modern Microprocessors: I. Direct Integration with a Hermite Scheme on x86_64 Architecture

- Computer Science
- 2006

### A hierarchical O(N log N) force-calculation algorithm

- Physics, Computer ScienceNature
- 1986

A novel method of directly calculating the force on N bodies that grows only as N log N is described, using a tree-structured hierarchical subdivision of space into cubic cells, each is recursively divided into eight subcells whenever more than one particle is found to occupy the same cell.

### $7.0/Mflops Astrophysical N-Body Simulation with Treecode on GRAPE-5

- PhysicsACM/IEEE SC 1999 Conference (SC'99)
- 1999

As an entry for the 1999 Gordon Bell price/performance prize, we report an astrophysical N-body simulation performed with a treecode on GRAPE-5 (Gravity Pipe 5) system, a special-purpose computer for…

### Pentium Pro Inside: I. A Treecode at 430 Gigaflops on ASCI Red, II. Price/Performance of $50/Mflop on Loki and Hyglac

- Computer ScienceACM/IEEE SC 1997 Conference (SC'97)
- 1997

Two methods of solving the gravitational N-body problem on ASCI Red and two simulations which sustained roughly one Gigaflop on each of two 16 processor Beowulf-class computers constructed entirely from commodity personal computer technology for $50k each in September, 1996 are presented.

### Scan primitives for GPU computing

- Computer ScienceGH '07
- 2007

Using the scan primitives, this work shows novel GPU implementations of quicksort and sparse matrix-vector multiply, and analyzes the performance of the scanPrimitives, several sort algorithms that use the scan Primitives, and a graphical shallow-water fluid simulation using the scan framework for a tridiagonal matrix solver.