DPP-PMRF: Rethinking Optimization for a Probabilistic Graphical Model Using Data-Parallel Primitives

  title={DPP-PMRF: Rethinking Optimization for a Probabilistic Graphical Model Using Data-Parallel Primitives},
  author={Brenton Lessley and T. Perciano and Colleen Heinemann and David Camp and Hank Childs and E. Wes Bethel},
  journal={2018 IEEE 8th Symposium on Large Data Analysis and Visualization (LDAV)},
We present a new parallel algorithm for probabilistic graphical model optimization. The algorithm relies on data-parallel primitives (DPPs), which provide portable performance over hardware architecture. We evaluate results on CPUs and GPUs for an image segmentation problem. Compared to a serial baseline, we observe runtime speedups of up to 13X (CPU) and 44X (GPU). We also compare our performance to a reference, OpenMP-based algorithm, and find speedups of up to 7X (CPU). 

Figures and Tables from this paper

Shared-Memory Parallel Probabilistic Graphical Modeling Optimization: Comparison of Threads, OpenMP, and Data-Parallel Primitives
This study is the first of its type to do performance analysis using hardware counters for comparing methods based on VTK-m-based data-parallel primitives with those based on more traditional OpenMP or threads-based parallelism, as there is increasing awareness of the need for platform portability in light of increasing node-level parallelism and increasing device heterogeneity.
High Performance Computing: 35th International Conference, ISC High Performance 2020, Frankfurt/Main, Germany, June 22–25, 2020, Proceedings
FASTHash is developed, a “truly” high throughput parallel hash table implementation using FPGA on-chip SRAM and provides theoretical worst case bound on the number of erroneous queries (true negative search, duplicate inserts) due to relaxed eventual consistency.
XVis: Visualization for the Extreme-Scale Scientific Computation Ecosystem, Final Report
The XVis project provides the necessary research and infrastructure for scientific discovery in this new computational ecosystem by addressing four interlocking challenges: emerging processor technology, in situ integration, usability, and proxy analysis.
Performance Tradeoffs in Shared-memory Platform Portable Implementations of a Stencil Kernel
This story is based on a manuscript originally written by Matthew Larsen in 2015 and then edited by Colleen Heinemann and Talita Perciano in 2016.
Performance Analysis of Traditional and Data-Parallel Primitive Implementations of Visualization and Analysis Kernels
This work focuses on performance analysis on modern multi-core platforms of three different visualization and analysis kernels that are implemented in different ways: one is "traditional", using combinations of C++ and VTK, and the other uses a data-parallel approach using VTK-m.


Volume Rendering Via Data-Parallel Primitives
An unstructured data volume rendering algorithm which is composed entirely of data-parallel primitives is introduced, and it is demonstrated that its performance on GPUs is comparable to code that was written for and optimized for the GPU, and the performance on CPUs is comparable.
Reduced-complexity image segmentation under parallel Markov Random Field formulation using graph partitioning
PMRF is introduced, a MRF-based framework that overcomes the NP-hard complexity of the MRF optimization by using graph partitioning, and the computational complexity is decreased as the optimization/parameter estimation is executed on small subgraphs.
Distributed memory parallel Markov random fields using graph partitioning
This work developed a general purpose distributed memory parallel MRF-based image analysis framework, MPI-PMRF, which overcomes performance and memory limitations by distributing data and computations across processors.
Linear and Parallel Learning of Markov Random Fields
A new embarrassingly parallel parameter learning algorithm for Markov random fields with untied parameters which is efficient for a large class of practical models and for loglinear models it is also data efficient, requiring only the local sufficient statistics of the data to estimate parameters.
PISTON: A Portable Cross-Platform Framework for Data-Parallel Visualization Operators
This work has devised a framework for creating highperformance parallel visualization and analysis operators that employs the data-parallel programming model, and has implemented isosurface, cut surface, and threshold operators, and achieved good parallel performance on two different architectures using the exact same operator code.
Volume rendering with data parallel visualization frameworks for emerging high performance computing architectures
This work implements a ray casting and cell projection volume renderer in Dax using DPP and compares their performance on three different hardware architectures and observes that additional architecture specific modifications are necessary to achieve acceptable performance on some architectures.
External Facelist Calculation with Data-Parallel Primitives
Overall, it is observed that the hashing-based implementation achieves better runtime performance for the majority of configurations, while also achieving the most-stable performance on highly unstructured data sets.
Vector Models for Data-Parallel Computing
A model of parallelism that extends and formalizes the Data-Parallel model on which the Connection Machine and other supercomputers are based is described, and it is argued that data-parallel models are not only practical and can be applied to a surprisingly wide variety of problems, they are also well suited for very-high-level languages and lead to a concise and clear description of algorithms and their complexity.
Ray tracing within a data parallel framework
This work presents a method for ray tracing consisting of entirely of data parallel primitives, and finds that the data parallel approach leads to results that are acceptable for many scientific visualization use cases, with the key benefit of providing a single code base that can run on many architectures.
Towards dense linear algebra for hybrid GPU accelerated manycore systems