Multicore-aware parallel temporal blocking of stencil codes for shared and distributed memory

@article{Wittmann2010MulticoreawarePT,
  title={Multicore-aware parallel temporal blocking of stencil codes for shared and distributed memory},
  author={M. Wittmann and G. Hager and G. Wellein},
  journal={2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW)},
  year={2010},
  pages={1-7}
}
  • M. Wittmann, G. Hager, G. Wellein
  • Published 2010
  • Computer Science
  • 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW)
  • New algorithms and optimization techniques are needed to balance the accelerating trend towards bandwidth-starved multicore chips. It is well known that the performance of stencil codes can be improved by temporal blocking, lessening the pressure on the memory interface. We introduce a new pipelined approach that makes explicit use of shared caches in multicore environments and minimizes synchronization and boundary overhead. For clusters of shared-memory nodes we demonstrate how temporal… CONTINUE READING
    44 Citations
    NUMA Aware Iterative Stencil Computations on Many-Core Systems
    • M. Shaheen, R. Strzodka
    • Computer Science
    • 2012 IEEE 26th International Parallel and Distributed Processing Symposium
    • 2012
    • 17
    • PDF
    Locality-Aware Stencil Computations Using Flash SSDs as Main Memory Extension
    • H. Midorikawa, Hideyuki Tan
    • Computer Science
    • 2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing
    • 2015
    • 6
    • PDF
    Applying Recursive Temporal Blocking for Stencil Computations to Deeper Memory Hierarchy
    • Toshio Endo
    • Computer Science
    • 2018 IEEE 7th Non-Volatile Memory Systems and Applications Symposium (NVMSA)
    • 2018
    • 3
    • PDF
    LIKWID: A Lightweight Performance-Oriented Tool Suite for x86 Multicore Environments
    • 411
    • PDF

    References

    SHOWING 1-10 OF 16 REFERENCES
    Implicit and explicit optimizations for stencil computations
    • 146
    • PDF
    Data locality optimizations for iterative numerical algorithms and cellular automata on hierarchical memory architectures
    • 32
    Efficient Temporal Blocking for Stencil Computations by Multicore-Aware Wavefront Parallelization
    • 102
    • PDF
    Multi-core architectures: Complexities of performance prediction and the impact of cache topology
    • 13
    • PDF
    Increasing Temporal Locality with Skewing and Recursive Blocking
    • 23
    • PDF
    Using time skewing to eliminate idle time due to memory bandwidth and network limitations
    • D. Wonnacott
    • Computer Science
    • Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000
    • 2000
    • 124
    • PDF
    Cache oblivious stencil computations
    • 175
    • PDF
    Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures
    • 358
    • PDF
    Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures
    • Kaushik Datta, M. Murphy, +6 authors K. Yelick
    • Computer Science
    • 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis
    • 2008
    • 285
    • PDF