Sailfish: A flexible multi-GPU implementation of the lattice Boltzmann method

  title={Sailfish: A flexible multi-GPU implementation of the lattice Boltzmann method},
  author={Michał Januszewski and Marcin Kostur},
  journal={Comput. Phys. Commun.},

Design and Optimizations of Lattice Boltzmann Methods for Massively Parallel GPU-Based Clusters

The authors describe the structure of the code, discussing in detail several key design choices that were guided by theoretical models of performance and experimental benchmarks, having in mind both single-GPU codes and massively parallel implementations on commodity clusters of GPUs.

Multi-GPU thermal lattice Boltzmann simulations using OpenACC and MPI

  • Ao XuBo Li
  • Computer Science, Physics
    International Journal of Heat and Mass Transfer
  • 2023

Regularized lattice Boltzmann method parallel model on heterogeneous platforms

An RLBM parallel model on the CPU/GPU heterogeneous platforms is proposed to solve the problem of possible GPU memory shortage and is extended to a multi‐GPU version, which is also applied to the 3D lid‐driven cavity flow.

An Out-of-Core Method for Physical Simulations on a Multi-GPU Architecture Using Lattice Boltzmann Method

  • J. DuchateauF. RousselleN. MaquignonG. RousselC. Renaud
  • Computer Science
    2016 Intl IEEE Conferences on Ubiquitous Intelligence & Computing, Advanced and Trusted Computing, Scalable Computing and Communications, Cloud and Big Data Computing, Internet of People, and Smart World Congress (UIC/ATC/ScalCom/CBDCom/IoP/SmartWorld)
  • 2016
The objective of this paper is to propose an efficient method of data exchanges between GPUs, the CPU main memory, which allows to perform fast complex simulations on large installations and the combination of this method with the massive parallelism of GPUs allows to keep good simulation performance.

Evaluation of a Directive-Based GPU Programming Approach for High-Order Unstructured Mesh Computational Fluid Dynamics

This work finds that sparse matrix vector multiplication with OpenCL is faster than using OpenACC with CuBLAS, and the directive based approach offered by OpenACC results in a flexible, unified and hence smaller code-base that is easier to maintain, is readily portable and promotes algorithm development.

Data-Oriented Language Implementation of Lattice-Boltzmann Method for Dense and Sparse Geometries

This work analyses the performance of an implementation based on a new approach called the data-oriented language, which allows the combination of complex memory access patterns with simple source code, and presents and provides the source code of a solver for D2Q9 lattice.

Evaluation of a performance portable lattice Boltzmann code using OpenCL

Results show that, contrary to conventional wisdom, using OpenCL it is possible to achieve a high degree of performance portability, at least for 3D lattice Boltzmann codes, using a set of straightforward techniques.

Physically based visual simulation of the Lattice Boltzmann method on the GPU: a survey

An up-to-date survey on the research regarding the LBM for fluid simulation using GPUs is given, discussing how the method was implemented with different GPU architectures and software frameworks, focusing on optimization techniques and their performance.

Cross-Platform GPU-Based Implementation of Lattice Boltzmann Method Solver Using ArrayFire Library

The solver leverages ArrayFire’s just-in-time compilation engine for compiling high-level code into optimized kernels for both CUDA and OpenCL GPU backends and it is shown that it is possible to produce fast cross-platform lattice Boltzmann method simulations with minimal code.



Accelerating Lattice Boltzmann Fluid Flow Simulations Using Graphics Processors

This paper improves upon prior single-precision GPU LBM results for the D3Q19 model by increasing GPU multiprocessor occupancy, resulting in an increase in maximum performance by 20%, and by introducing a space-efficient storage method which reduces GPU RAM requirements by 50% at a slight detriment to performance.

A new approach to the lattice Boltzmann method for graphics processing units

Global Memory Access Modelling for Efficient Implementation of the Lattice Boltzmann Method on Graphics Processing Units

The main goal is to facilitate optimisation of regular data-parallel applications on GPUs by forming a model capable of estimating the execution time for a large class of applications.

LBM based flow simulation using GPU computing processor

Graphics processing unit implementation of lattice Boltzmann models for flowing soft systems.

A graphic processing unit (GPU) implementation of the multicomponent lattice Boltzmann equation with multirange interactions for soft-glassy materials is presented, considerably expanding the scope of the glassy LB toward the investigation of long-time relaxation properties of soft-flowing glassy materials.

Implementation of a Lattice Boltzmann kernel using the Compute Unified Device Architecture developed by nVIDIA

  • J. Tölke
  • Computer Science
    Comput. Vis. Sci.
  • 2010
In this article a very efficient implementation of a 2D-Lattice Boltzmann kernel using the Compute Unified Device Architecture (CUDA™) interface developed by nVIDIA® is presented. By exploiting the

TeraFLOP computing on a desktop PC with GPUs for 3D CFD

A very efficient implementation of a lattice Boltzmann (LB) kernel in 3D on a graphical processing unit using the compute unified device architecture interface developed by nVIDIA is presented. By

Accelerating numerical solution of stochastic differential equations with CUDA