DPP-PMRF: Rethinking Optimization for a Probabilistic Graphical Model Using Data-Parallel Primitives

  title={DPP-PMRF: Rethinking Optimization for a Probabilistic Graphical Model Using Data-Parallel Primitives},
  author={Brenton Lessley and T. Perciano and Colleen Heinemann and David Camp and Hank Childs and E. Wes Bethel},
  journal={2018 IEEE 8th Symposium on Large Data Analysis and Visualization (LDAV)},
We present a new parallel algorithm for probabilistic graphical model optimization. The algorithm relies on data-parallel primitives (DPPs), which provide portable performance over hardware architecture. We evaluate results on CPUs and GPUs for an image segmentation problem. Compared to a serial baseline, we observe runtime speedups of up to 13X (CPU) and 44X (GPU). We also compare our performance to a reference, OpenMP-based algorithm, and find speedups of up to 7X (CPU). 

Figures and Tables from this paper

Shared-Memory Parallel Probabilistic Graphical Modeling Optimization: Comparison of Threads, OpenMP, and Data-Parallel Primitives

This study is the first of its type to do performance analysis using hardware counters for comparing methods based on VTK-m-based data-parallel primitives with those based on more traditional OpenMP or threads-based parallelism, as there is increasing awareness of the need for platform portability in light of increasing node-level parallelism and increasing device heterogeneity.

Performance Analysis of Traditional and Data-Parallel Primitive Implementations of Visualization and Analysis Kernels

This work focuses on performance analysis on modern multi-core platforms of three different visualization and analysis kernels that are implemented in different ways: one is "traditional", using combinations of C++ and VTK, and the other uses a data-parallel approach using VTK-m.

High Performance Computing: 35th International Conference, ISC High Performance 2020, Frankfurt/Main, Germany, June 22–25, 2020, Proceedings

FASTHash is developed, a “truly” high throughput parallel hash table implementation using FPGA on-chip SRAM and provides theoretical worst case bound on the number of erroneous queries (true negative search, duplicate inserts) due to relaxed eventual consistency.

XVis: Visualization for the Extreme-Scale Scientific Computation Ecosystem, Final Report

The XVis project provides the necessary research and infrastructure for scientific discovery in this new computational ecosystem by addressing four interlocking challenges: emerging processor technology, in situ integration, usability, and proxy analysis.

Performance Tradeoffs in Shared-memory Platform Portable Implementations of a Stencil Kernel

This story is based on a manuscript originally written by Matthew Larsen in 2015 and then edited by Colleen Heinemann and Talita Perciano in 2016.



Reduced-complexity image segmentation under parallel Markov Random Field formulation using graph partitioning

PMRF is introduced, a MRF-based framework that overcomes the NP-hard complexity of the MRF optimization by using graph partitioning, and the computational complexity is decreased as the optimization/parameter estimation is executed on small subgraphs.

Distributed memory parallel Markov random fields using graph partitioning

This work developed a general purpose distributed memory parallel MRF-based image analysis framework, MPI-PMRF, which overcomes performance and memory limitations by distributing data and computations across processors.

Linear and Parallel Learning of Markov Random Fields

A new embarrassingly parallel parameter learning algorithm for Markov random fields with untied parameters which is efficient for a large class of practical models and for loglinear models it is also data efficient, requiring only the local sufficient statistics of the data to estimate parameters.

Maximal clique enumeration with data-parallel primitives

This work considers maximal clique enumeration on shared-memory, multi-core architectures and introduces an approach consisting entirely of data-parallel operations, in an effort to achieve efficient and portable performance across different architectures.

The STAPL Parallel Graph Library

The library introduces pGraph pViews that separate algorithm design from the container implementation, and supports three graph processing algorithmic paradigms, level-synchronous, asynchronous and coarse-grained, and provides common graph algorithms based on them.

PISTON: A Portable Cross-Platform Framework for Data-Parallel Visualization Operators

This work has devised a framework for creating highperformance parallel visualization and analysis operators that employs the data-parallel programming model, and has implemented isosurface, cut surface, and threshold operators, and achieved good parallel performance on two different architectures using the exact same operator code.

Volume rendering with data parallel visualization frameworks for emerging high performance computing architectures

This work implements a ray casting and cell projection volume renderer in Dax using DPP and compares their performance on three different hardware architectures and observes that additional architecture specific modifications are necessary to achieve acceptable performance on some architectures.

External Facelist Calculation with Data-Parallel Primitives

Overall, it is observed that the hashing-based implementation achieves better runtime performance for the majority of configurations, while also achieving the most-stable performance on highly unstructured data sets.

Vector Models for Data-Parallel Computing

A model of parallelism that extends and formalizes the Data-Parallel model on which the Connection Machine and other supercomputers are based is described, and it is argued that data-parallel models are not only practical and can be applied to a surprisingly wide variety of problems, they are also well suited for very-high-level languages and lead to a concise and clear description of algorithms and their complexity.

Ray tracing within a data parallel framework

This work presents a method for ray tracing consisting of entirely of data parallel primitives, and finds that the data parallel approach leads to results that are acceptable for many scientific visualization use cases, with the key benefit of providing a single code base that can run on many architectures.