Importance of explicit vectorization for CPU and GPU software performance

@article{Dickson2011ImportanceOE,
  title={Importance of explicit vectorization for CPU and GPU software performance},
  author={N. Dickson and K. Karimi and F. Hamze},
  journal={J. Comput. Phys.},
  year={2011},
  volume={230},
  pages={5383-5398}
}
Much of the current focus in high-performance computing is on multi-threading, multi-computing, and graphics processing unit (GPU) computing. However, vectorization and non-parallel optimization techniques, which can often be employed additionally, are less frequently discussed. In this paper, we present an analysis of several optimizations done on both central processing unit (CPU) and GPU implementations of a particular computationally intensive Metropolis Monte Carlo algorithm. Explicit… Expand
26 Citations
Performance potential for simulating spin models on GPU
  • M. Weigel
  • Computer Science, Physics
  • J. Comput. Phys.
  • 2012
  • 56
  • PDF
Simulating spin models on GPU
  • M. Weigel
  • Computer Science, Physics
  • Comput. Phys. Commun.
  • 2011
  • 55
  • Highly Influenced
  • PDF
A Comparative Evaluation of Parallel Programming Models for Shared-Memory Architectures
  • 4
Using Intra-Core Loop-Task Accelerators to Improve the Productivity and Performance of Task-Based Parallel Programs
  • 7
  • PDF
Numerical characterization of nonlinear dynamical systems using parallel computing: The role of GPUs approach
  • 7
  • PDF
...
1
2
3
...

References

SHOWING 1-10 OF 36 REFERENCES
High-performance Physics Simulations Using Multi-core CPUs and GPGPUs in a Volunteer Computing Context
  • 20
  • PDF
GPU-CPU multi-core for real-time signal processing
  • Saraju P. Mohanty
  • Computer Science
  • 2009 Digest of Technical Papers International Conference on Consumer Electronics
  • 2009
  • 14
  • PDF
Programming Massively Parallel Processors. A Hands-on Approach
  • J. Cheng
  • Computer Science
  • Scalable Comput. Pract. Exp.
  • 2010
  • 1,774
Using advanced compiler technology to exploit the performance of the Cell Broadband EngineTM architecture
  • 138
A Survey of General-Purpose Computation on Graphics Hardware
  • 1,882
  • PDF
Scientific Parallel Computing
  • 66
  • PDF
...
1
2
3
4
...