A novel multiple-walk parallel algorithm for the Barnes–Hut treecode on GPUs – towards cost effective, high performance N-body simulation

@article{Hamada2009A,
  title={
 A novel multiple-walk parallel algorithm for the Barnes–Hut treecode on GPUs – towards
 cost effective, high performance N-body simulation
},
  author={Tsuyoshi Hamada and Keigo Nitadori and Khaled Benkrid and Yousuke Ohno and Gentaro Morimoto and Tomonari Masada and Yuichiro Shibata and Kiyoshi Oguri and Makoto Taiji},
  journal={Computer Science - Research and Development},
  year={2009},
  volume={24},
  pages={21-31}
}
AbstractRecently, general-purpose computation on graphics processing units (GPGPU) has become an increasingly popular field of study as graphics processing units (GPUs) continue to be proposed as high performance and relatively low cost implementation platforms for scientific computing applications. Among these applications figure astrophysical N-bodysimulations, which form one of the most challenging problems in computational science. However, in most reported studies, a simple $ \mathcal… 

Barnes-hut treecode on GPU

  • Hu JiangQianni Deng
  • Computer Science
    2010 IEEE International Conference on Progress in Informatics and Computing
  • 2010
A new implementation of tree-algorithm on GPU using CUDA, which has obtained more than 100X speedup when computing forces between bodies, and rises up a new method to build tree in this algorithm, making the performance even better.

An Efficient CUDA Implementation of the Tree-Based Barnes Hut n-Body Algorithm

42 TFlops hierarchical N-body simulations on GPUs with applications in both astrophysics and turbulence

The present method calculates the O(N log N) treecode and O (N) fast multipole method (FMM) on the GPUs with unprecedented efficiency and demonstrates the performance of the method by choosing one standard application -a gravitational N-body simulation- and one non-standard application -simulation of turbulence using vortex particles.

A novel parallel algorithm for near-field computation in N-body problem on GPU

A novel efficient parallel algorithm for the near-field computation in N-body problem on the Graphics Processing Unit (GPU) architecture is proposed, based on the Newton’s third law and Z-order Space Filling Curve.

The algorithm mapping of the near-field computation in N-body problem on GPU

This paper discusses the principle of mapping algorithm efficiently on to the Graphics Processing Unit (GPU) architecture from the aspects of task partition and data access by researching the

190 TFlops Astrophysical N-body Simulation on a Cluster of GPUs

  • T. HamadaK. Nitadori
  • Computer Science, Physics
    2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
  • 2010
The results of a hierarchical N-body simulation on DEGIMA, a cluster of PCs with 576 graphic processing units (GPUs) and using an InfiniBand interconnect using Infini band are presented.

On the parallelization and performance analysis of Barnes–Hut algorithm using Java parallel platforms

Multi-core processors provide time-efficient and cost-effective solutions to execute the algorithms for complex physical systems. However, to efficiently exploit the processing capabilities of the

Parallel time-space processing model based fast N-body simulation on GPUs

A novel parallel implementation of N-body gravitational simulation on GPUs is presented, and the experimental results show that this method achieves an acceleration of 413 compared with CPU, and an acceleration up to 5.5 times compared with other GPU based methods.

Petascale turbulence simulation using a highly parallel fast multipole method on GPUs

References

SHOWING 1-10 OF 30 REFERENCES

GPGPU: general-purpose computation on graphics hardware

The graphics processor (GPU) on today's commodity video cards has evolved into an extremely powerful and flexible processor. Modern graphics architectures provide tremendous memory bandwidth and

The Chamomile Scheme: An Optimized Algorithm for N-body simulations on Programmable Graphics Processing Units

An algorithm named "Chamomile Scheme" is presented, fully optimized for calculating gravitational interactions on the latest programmable Graphics Processing Unit (GPU), NVIDIA GeForce8800GTX, which has small but fast shared memories and floating point arithmetic hardware but only for single precision.

A hierarchical O(N log N) force-calculation algorithm

A novel method of directly calculating the force on N bodies that grows only as N log N is described, using a tree-structured hierarchical subdivision of space into cubic cells, each is recursively divided into eight subcells whenever more than one particle is found to occupy the same cell.

$7.0/Mflops Astrophysical N-Body Simulation with Treecode on GRAPE-5

As an entry for the 1999 Gordon Bell price/performance prize, we report an astrophysical N-body simulation performed with a treecode on GRAPE-5 (Gravity Pipe 5) system, a special-purpose computer for

Pentium Pro Inside: I. A Treecode at 430 Gigaflops on ASCI Red, II. Price/Performance of $50/Mflop on Loki and Hyglac

Two methods of solving the gravitational N-body problem on ASCI Red and two simulations which sustained roughly one Gigaflop on each of two 16 processor Beowulf-class computers constructed entirely from commodity personal computer technology for $50k each in September, 1996 are presented.

Scan primitives for GPU computing

Using the scan primitives, this work shows novel GPU implementations of quicksort and sparse matrix-vector multiply, and analyzes the performance of the scanPrimitives, several sort algorithms that use the scan Primitives, and a graphical shallow-water fluid simulation using the scan framework for a tridiagonal matrix solver.

Avalon: an Alpha/Linux cluster achieves 10 Gflops for $15k

As an entry for the 1998 Gordon Bell price/performance prize, we present two calculations from the disciplines of condensed matter physics and astrophysics. The simulations were performed on a 70