Data access optimizations for highly threaded multi-core CPUs with multiple memory controllers

@article{Hager2007DataAO,
  title={Data access optimizations for highly threaded multi-core CPUs with multiple memory controllers},
  author={Georg Hager and Thomas Zeiser and Gerhard Wellein},
  journal={2008 IEEE International Symposium on Parallel and Distributed Processing},
  year={2007},
  pages={1-7}
}
Processor and system architectures that feature multiple memory controllers are prone to show bottlenecks and erratic performance numbers on codes with regular access patterns. Although such effects are well known in the form of cache thrashing and aliasing conflicts, they become more severe when memory access is involved. Using the new Sun UltraSPARC T2 processor as a prototypical multi-core design, we analyze performance patterns in low-level and application benchmarks and show ways to… CONTINUE READING

Citations

Publications citing this paper.
SHOWING 1-10 OF 21 CITATIONS

A brief survey of differential evolution on Graphic Processing Units

  • 2013 IEEE Symposium on Differential Evolution (SDE)
  • 2013
VIEW 1 EXCERPT
CITES BACKGROUND

Genetic algorithm for clustering accelerated by the CUDA platform

  • 2012 IEEE International Conference on Systems, Man, and Cybernetics (SMC)
  • 2012
VIEW 1 EXCERPT
CITES BACKGROUND

References

Publications referenced by this paper.
SHOWING 1-10 OF 10 REFERENCES

C++ programming techniques for High Performance Computing on systems with non-uniform memory access using OpenMP

H. Stengel
  • Diploma thesis, University of Applied Sciences Nuremberg,
  • 2007
VIEW 1 EXCERPT

C++ programming techniques for High Performance Computing on systems with non-uniform memory access using OpenMP

H. Stengel
  • Diploma thesis, University of Applied Sciences Nuremberg,
  • 2007
VIEW 1 EXCERPT

J

S. W. Williams, L. Oliker, R. Vuduc, K. Yelick
  • Demmel and J. Shalf: Optimization of Sparse Matrix-vector Multiplication on Emerging Multicore Platforms. Proceedings of SC07, Reno, Nevada, Nov. 10–16
  • 2007
VIEW 1 EXCERPT

J

S. W. Williams, L. Oliker, R. Vuduc, K. Yelick
  • Demmel and J. Shalf: Optimization of Sparse Matrix-vector Multiplication on Emerging Multicore Platforms. Proceedings of SC07, Reno, Nevada, Nov. 10–16
  • 2007
VIEW 1 EXCERPT

U

S. Donath, K. Iglberger, G. Wellein, T. Zeiser, A. Nitsure
  • Rüde: Performance comparison of different parallel lattice Boltzmann implementations on multicore multi-socket systems. Accepted for publication in Int. J. Comp. Sci. Eng.
  • 2007
VIEW 1 EXCERPT

S

G. Wellein, T. Zeiser
  • Donath and G. Hager: On the Single Processor Performance of Simple Lattice Boltzmann Kernels.Computers & Fluids 35, 910–919
  • 2006
VIEW 1 EXCERPT

Scientific Supercomputing: Architecture and Use of Shared and Distributed Memory Parallel Computers

W. Schönauer
  • Self-edition, Karlsruhe
  • 2000
VIEW 1 EXCERPT

Austern: Segmented Iterators and Hierarchical Algorithms

M H.
  • Generic programming: International Seminar on Generic Programming, Dagstuhl Castle, Germany,
  • 1998
VIEW 1 EXCERPT

Austern: Segmented Iterators and Hierarchical Algorithms

M H.
  • Generic programming: International Seminar on Generic Programming, Dagstuhl Castle, Germany,
  • 1998
VIEW 1 EXCERPT

Similar Papers

Loading similar papers…