Performance Analysis of Traditional and Data-Parallel Primitive Implementations of Visualization and Analysis Kernels

  title={Performance Analysis of Traditional and Data-Parallel Primitive Implementations of Visualization and Analysis Kernels},
  author={E. Wes Bethel and David Camp and T. Perciano and Colleen Heinemann},
Measurements of absolute runtime are useful as a summary of performance when studying parallel visualization and analysis methods on computational platforms of increasing concurrency and complexity. We can obtain even more insights by measuring and examining more detailed measures from hardware performance counters, such as the number of instructions executed by an algorithm implemented in a particular way, the amount of data moved to/from memory, memory hierarchy utilization levels via cache… 

Figures and Tables from this paper


Shared-Memory Parallel Probabilistic Graphical Modeling Optimization: Comparison of Threads, OpenMP, and Data-Parallel Primitives
This study is the first of its type to do performance analysis using hardware counters for comparing methods based on VTK-m-based data-parallel primitives with those based on more traditional OpenMP or threads-based parallelism, as there is increasing awareness of the need for platform portability in light of increasing node-level parallelism and increasing device heterogeneity.
Ray tracing within a data parallel framework
This work presents a method for ray tracing consisting of entirely of data parallel primitives, and finds that the data parallel approach leads to results that are acceptable for many scientific visualization use cases, with the key benefit of providing a single code base that can run on many architectures.
Kokkos: Enabling manycore performance portability through polymorphic memory access patterns
VTK-m: Accelerating the Visualization Toolkit for Massively Threaded Architectures
The VTK-m framework serves as a container for algorithms, provides flexible data representation, and simplifies the design of visualization algorithms on new and future computer architecture.
Computer Organization and Design, Fourth Edition, Fourth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design)
The classic textbook for computer systems analysis and design, Computer Organization and Design, has been thoroughly updated to provide a new focus on the revolutionary change taking place in
Performance-Portable Particle Advection with VTK-m
This paper proposes a performance-portable algorithm for particle advection based on the recently introduced VTK-m system and chiefly relies on its device adapter abstraction, and demonstrates the general portability of the implementation across a wide variety of hardware.
Compiling Stencils in High Performance Fortran
This paper presents a general- purpose compiler optimization strategy that generates efficient code for a wide class of stencil computations expressed using Fortran90 array constructs by orchestrating a set of program transformations that minimize both intraprocessor and interprocessor data movement implied by Fortran 90 array operations.
Maximal clique enumeration with data-parallel primitives
This work considers maximal clique enumeration on shared-memory, multi-core architectures and introduces an approach consisting entirely of data-parallel operations, in an effort to achieve efficient and portable performance across different architectures.
Dark Silicon and the End of Multicore Scaling
A comprehensive study that projects the speedup potential of future multicores and examines the underutilization of integration capacity-dark silicon-is timely and crucial.