Global Futures: A Multithreaded Execution Model for Global Arrays-based Applications
With supercomputers anticipated to expand from thousands to millions of cores, one of the challenges facing scientists is how to effectively utilize this ever-increasing number. We report here an approach that creates a heterogeneous decomposition by partitioning effort according to the scaling properties of the component algorithms. We demonstrate our strategy by developing a capability to model hot dense plasma. We have performed benchmark calculations ranging from millions to billions of charged particles, including a 2.8 billion particle simulation that achieved 259.9 TFlop/s (26% of peak performance) on the 294,912 cpu JUGENE computer at the Jülich Supercomputing Centre in Germany. With this unprecedented simulation capability we have begun an investigation of plasma fusion physics under conditions where both theory and experiment are lacking--in the strongly-coupled regime as the plasma begins to burn. Our strategy is applicable to other problems involving long-range forces (i.e., biological or astrophysical simulations). We believe that the flexible heterogeneous decomposition approach demonstrated here will allow many problems to scale across current and next-generation machines.