A parallel hashed oct-tree N-body algorithm

@article{Warren1993APH,
  title={A parallel hashed oct-tree N-body algorithm},
  author={Michael S. Warren and John K. Salmon},
  journal={Supercomputing '93. Proceedings},
  year={1993},
  pages={12-21}
}
The authors report on an efficient adaptive N-body method which we have recently designed and implemented. The algorithm computes the forces on an arbitrary distribution of bodies in a time which scales as N log N with the particle number. The accuracy of the force calculations is analytically bounded, and can be adjusted via a user defined parameter between a few percent relative accuracy, down to machine arithmetic accuracy. Instead of using pointers to indicate the topology of the tree, the… 

Figures from this paper

Implementation of a parallel tree method on a GPU
N-Body Simulations Using Message Passsing Parallel Computers
TLDR
New parallel formulations of the Barnes-Hut method are presented for n-body simulations on message passing computers that partition the domain eeciently incurring minimal communication overhead in contrast to existing schemes that are based on sorting a large number of keys or on the use of global data structures.
A sparse octree gravitational N-body code that runs entirely on the GPU processor
A Data-Parallel Implementation of O(N) Hierarchical N-Body Methods
  • Yu Hu, S. Johnsson
  • Computer Science
    Proceedings of the 1996 ACM/IEEE Conference on Supercomputing
  • 1996
TLDR
A data-parallel implementation of Anderson's method is presented and both efficiency and scalability of the implementation on the Connection Machine CM-5/5E systems are demonstrated.
The Parallel Implementation of N-body Algorithms Post-doctoral Fellow
TLDR
The atomic message model is presented, motivated by the problem of transferring large messages in a system with limited communication resources and bandwidth at each node, and it is shown that simple randomized protocols nonetheless provide high communication throughput.
2 HOT : An improved parallel hashed oct-tree N-body algorithm for cosmological simulation 1
We report on improvements made over the past two decades to our adaptive treecode N-body method (HOT). A mathematical and computational approach to the cosmological N-body problem is described, with
2HOT: An improved parallel hashed oct-tree N-Body algorithm for cosmological simulation
  • Michael S. Warren
  • Physics
    2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC)
  • 2013
TLDR
These results include the first simulations using the new constraints on the standard model of cosmology from the Planck satellite and set a new standard for accuracy and scientific throughput, while meeting or exceeding the computational efficiency of the latest generation of hybrid TreePM N-body methods.
An Efficient Load Balancing Technique for Parallel FMA in Message Passing Environment
TLDR
A new partitioning technique called weighted subtrees is proposed and presented and its performance results are presented.
An evaluation of computing paradigms for N-body simulations on distributed memory architectures
TLDR
This work examines inefficiencies in the implmentation of HPF, determines that most of the extra overhead is due to a single aspect of the communication strategy, and demonstrates that fixing the Communication strategy can bring the overheads of the HPF application to within 25% of those of the hand-coded version.
Bottom-Up Construction and 2: 1 Balance Refinement of Linear Octrees in Parallel
TLDR
New parallel algorithms for the construction and 2:1 balance refinement of large linear octrees on distributed memory machines, used in many problems in computational science and engineering, are proposed.
...
...

References

SHOWING 1-10 OF 34 REFERENCES
Parallel hierarchical N-body methods
TLDR
It is shown how the BH algorithm can be adapted to execute in parallel, and the performance of the parallel version of the algorithm is analyzed, finding that the overhead is due primarily to interprocessor synchronization delays and redundant computation.
An Efficient Program for Many-Body Simulation
TLDR
This paper describes both the particular program and the methodology underlying such speedups that reduced the running time of a large problem $(N = 10,000)$ by a factor of four hundred.
An efficient N-body algorithm for a fine-grain parallel computer
TLDR
An N-body algorithm for a parallel computer combining the generality of direct-summation codes with the favorable scaling properties of FFT and multipole-expansion codes is described.
A hierarchical O(N log N) force-calculation algorithm
TLDR
A novel method of directly calculating the force on N bodies that grows only as N log N is described, using a tree-structured hierarchical subdivision of space into cubic cells, each is recursively divided into eight subcells whenever more than one particle is found to occupy the same cell.
The Parallel Multipole Method on the Connection Machine
TLDR
This paper reports on a fast implementation of the three-dimensional nonadaptive Parallel Multipole Method (PMM) on the Connection Machine system model CM–2, modeled by a hierarchy of three- dimensional grids forming a pyramid in which parent nodes have degree eight.
Skeletons from the treecode closet
TLDR
It is found that the conventional Barnes-Hut MAC can introduce potentially unbounded errors unless θ 3 , and that this behavior while rare, is demonstrable in astrophysically reasonable examples.
Fast Parallel Tree Codes for Gravitational and Fluid Dynamical N-Body Problems
TLDR
Two physical systems from separate disciplines that make use of the same algorithmic and mathematical structures to reduce the number of operations necessary to complete a realistic simulation of the gravitational N- body problem and the simulation of incompressible flows are discussed.
Abstractions for parallel N-body simulations
TLDR
Introduces C++ programming abstractions for maintaining load-balanced partitions of irregular and adaptive trees that substantially reduces the programming complexity and the overhead for distributed memory architectures.
...
...