A novel multiple-walk parallel algorithm for the Barnes–Hut treecode on GPUs – towards cost effective, high performance N-body simulation

@article{Hamada2009A,
  title={
 A novel multiple-walk parallel algorithm for the Barnes–Hut treecode on GPUs – towards
 cost effective, high performance N-body simulation
},
  author={Tsuyoshi Hamada and Keigo Nitadori and Khaled Benkrid and Yousuke Ohno and Gentaro Morimoto and Tomonari Masada and Yuichiro Shibata and Kiyoshi Oguri and Makoto Taiji},
  journal={Computer Science - Research and Development},
  year={2009},
  volume={24},
  pages={21-31}
}
AbstractRecently, general-purpose computation on graphics processing units (GPGPU) has become an increasingly popular field of study as graphics processing units (GPUs) continue to be proposed as high performance and relatively low cost implementation platforms for scientific computing applications. Among these applications figure astrophysical N-bodysimulations, which form one of the most challenging problems in computational science. However, in most reported studies, a simple $ \mathcal… Expand
Barnes-hut treecode on GPU
  • Hu Jiang, Qianni Deng
  • Computer Science
  • 2010 IEEE International Conference on Progress in Informatics and Computing
  • 2010
TLDR
A new implementation of tree-algorithm on GPU using CUDA, which has obtained more than 100X speedup when computing forces between bodies, and rises up a new method to build tree in this algorithm, making the performance even better. Expand
A sparse octree gravitational N-body code that runs entirely on the GPU processor
TLDR
The gravitational tree-code outperforms tuned CPU code during the tree-construction and shows a performance improvement of more than a factor 20 overall, resulting in a processing rate of morethan 2.8 million particles per second. Expand
An Efficient CUDA Implementation of the Tree-Based Barnes Hut n-Body Algorithm
TLDR
This chapter describes the first CUDA implementation of the classical Barnes Hut n-body algorithm that runs entirely on the GPU, concluding that GPUs can be used to accelerate irregular codes, not just regular codes. Expand
42 TFlops hierarchical N-body simulations on GPUs with applications in both astrophysics and turbulence
TLDR
The present method calculates the O(N log N) treecode and O (N) fast multipole method (FMM) on the GPUs with unprecedented efficiency and demonstrates the performance of the method by choosing one standard application -a gravitational N-body simulation- and one non-standard application -simulation of turbulence using vortex particles. Expand
A novel parallel algorithm for near-field computation in N-body problem on GPU
A novel efficient parallel algorithm for the near-field computation in N-body problem on the Graphics Processing Unit (GPU) architecture is proposed in this paper. This algorithm evolved from the BPBExpand
The algorithm mapping of the near-field computation in N-body problem on GPU
This paper discusses the principle of mapping algorithm efficiently on to the Graphics Processing Unit (GPU) architecture from the aspects of task partition and data access by researching theExpand
190 TFlops Astrophysical N-body Simulation on a Cluster of GPUs
  • T. Hamada, Keigo Nitadori
  • Computer Science
  • 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
  • 2010
TLDR
The results of a hierarchical N-body simulation on DEGIMA, a cluster of PCs with 576 graphic processing units (GPUs) and using an InfiniBand interconnect using Infini band are presented. Expand
On the parallelization and performance analysis of Barnes–Hut algorithm using Java parallel platforms
Multi-core processors provide time-efficient and cost-effective solutions to execute the algorithms for complex physical systems. However, to efficiently exploit the processing capabilities of theExpand
Parallel time-space processing model based fast N-body simulation on GPUs
TLDR
A novel parallel implementation of N-body gravitational simulation on GPUs is presented, and the experimental results show that this method achieves an acceleration of 413 compared with CPU, and an acceleration up to 5.5 times compared with other GPU based methods. Expand
Petascale turbulence simulation using a highly parallel fast multipole method on GPUs
TLDR
Large-scale direct numerical simulations of homogeneous-isotropic fluid turbulence, achieving sustained performance of 1.08 petaflop/s on gpu  hardware using single precision, exceeds by an order of magnitude the largest vortex-method calculations to date. Expand
...
1
2
3
4
...

References

SHOWING 1-10 OF 30 REFERENCES
High Performance Direct Gravitational N-body Simulations on Graphics Processing Units
TLDR
The results of gravitational direct N-body simulations using the graphics processing unit (GPU) on a commercial NVIDIA GeForce 8800GTX designed for gaming computers show that modern GPUs offer an attractive alternative to GRAPE-6Af special purpose hardware. Expand
High-performance direct gravitational N-body simulations on graphics processing units
Abstract We present the results of gravitational direct N-body simulations using the commercial graphics processing units (GPU) NVIDIA Quadro FX1400 and GeForce 8800GTX, and compare the results withExpand
GPGPU: general-purpose computation on graphics hardware
The graphics processor (GPU) on today's commodity video cards has evolved into an extremely powerful and flexible processor. Modern graphics architectures provide tremendous memory bandwidth andExpand
The Chamomile Scheme: An Optimized Algorithm for N-body simulations on Programmable Graphics Processing Units
We present an algorithm named "Chamomile Scheme". The scheme is fully optimized for calculating gravitational interactions on the latest programmable Graphics Processing Unit (GPU), NVIDIAExpand
Performance Tuning of N-Body Codes on Modern Microprocessors: I. Direct Integration with a Hermite Scheme on x86_64 Architecture
Abstract The main performance bottleneck of gravitational N -body codes is the force calculation between two particles. We have succeeded in speeding up this pair-wise force calculation by factorsExpand
A hierarchical O(N log N) force-calculation algorithm
TLDR
A novel method of directly calculating the force on N bodies that grows only as N log N is described, using a tree-structured hierarchical subdivision of space into cubic cells, each is recursively divided into eight subcells whenever more than one particle is found to occupy the same cell. Expand
$7.0/Mflops Astrophysical N-Body Simulation with Treecode on GRAPE-5
As an entry for the 1999 Gordon Bell price/performance prize, we report an astrophysical N-body simulation performed with a treecode on GRAPE-5 (Gravity Pipe 5) system, a special-purpose computer forExpand
Pentium Pro Inside: I. A Treecode at 430 Gigaflops on ASCI Red, II. Price/Performance of $50/Mflop on Loki and Hyglac
TLDR
Two methods of solving the gravitational N-body problem on ASCI Red and two simulations which sustained roughly one Gigaflop on each of two 16 processor Beowulf-class computers constructed entirely from commodity personal computer technology for $50k each in September, 1996 are presented. Expand
Scan primitives for GPU computing
TLDR
Using the scan primitives, this work shows novel GPU implementations of quicksort and sparse matrix-vector multiply, and analyzes the performance of the scanPrimitives, several sort algorithms that use the scan Primitives, and a graphical shallow-water fluid simulation using the scan framework for a tridiagonal matrix solver. Expand
A modified tree code: don't laugh; it runs
TLDR
A modification of the Barnes-Hut tree algorithm is described together with a series of numerical tests of this method to improve the performance of the code on heavily vector-oriented machines such as the Cyber 205 by exploiting the fact that nearby particles tend to have very similar interaction lists. Expand
...
1
2
3
...