Learn More
Concurrency is pervasive and perplexing, particularly on graphics processing units (GPUs). Current specifications of languages and hardware are inconclusive; thus programmers often rely on folklore assumptions when writing software. To remedy this state of affairs, we conducted a large empirical study of the concurrent behaviour of deployed GPUs. Armed(More)
A <em>memory consistency model</em> (MCM) is the part of a programming language or computer architecture specification that defines which values can legally be read from shared memory locations. Because MCMs take into account various optimisations employed by architectures and compilers, they are often complex and counterintuitive, which makes them(More)
We present a technique for the formal verification of GPU kernels, addressing two classes of correctness properties: data races and barrier divergence. Our approach is founded on a novel formal operational semantics for GPU kernels termed &lt;i&gt;synchronous, delayed visibility (SDV)&lt;/i&gt; semantics, which captures the execution of a GPU kernel by(More)
We propose a new formalisation of stability for Rely-Guarantee , in which an assertion's stability is encoded into its syntactic form. This allows two advances in modular reasoning. Firstly, it enables Rely-Guarantee, for the first time, to verify concurrent libraries independently of their clients' environments. Secondly, in a sequential setting, it allows(More)
A program proof should not merely certify that a program is correct; it should explain why it is correct. A proof should be more than 'true': it should be informative, and it should be intelligible. Extending work by Bean [1], we introduce a system that produces readable program proofs that are highly scalable and easily modified. The de facto standard for(More)
We propose a model of computation, based on data flow, that unifies several disparate programming phenomena, including local and shared variables, synchronised and buffered communication, reliable and unreliable channels, dynamic and static allocation, explicit and garbage-collected disposal, fine-grained and coarse-grained concurrency, and weakly and(More)
We provide a case study of work-stealing, a popular method for run-time load balancing, on FPGAs. Following the Cederman-Tsigas implementation for GPUs, we synchronize work-items not with locks, mutexes or critical sections, but instead with the atomic operations provided by Altera's OpenCL SDK. We evaluate work-stealing for FPGAs by synthesizing a K-means(More)