Learn More
Several domain-specific languages (DSLs) for parallel graph analytics have been proposed recently. In this paper, we argue that existing DSLs can be implemented on top of a general-purpose infrastructure that (i) supports very fine-grain tasks, (ii) implements autonomous, speculative execution of these tasks, and (iii) allows application-specific control of(More)
For more than thirty years, the parallel programming community has used the <i>dependence graph</i> as the main abstraction for reasoning about and exploiting parallelism in "regular" algorithms that use dense arrays, such as finite-differences and FFTs. In this paper, we argue that the dependence graph is not a suitable abstraction for algorithms in new(More)
Performance tuning for data centers is essential and complicated. It is important since a data center comprises thousands of machines and thus a single-digit performance improvement can significantly reduce cost and power consumption. Unfortunately, it is extremely difficult as data centers are dynamic environments where applications are frequently released(More)
Speculative execution is a promising approach for exploiting parallelism in many programs, but it requires efficient schemes for detecting conflicts between concurrently executing threads. Prior work has argued that checking semantic commutativity of method invocations is the right way to detect conflicts for complex data structures such as kd-trees.(More)
Non-determinism in program execution can make program development and debugging difficult. In this paper, we argue that solutions to this problem should be on-demand, portable and parameterless. On-demand means that the programming model should permit the writing of non-deterministic programs since these programs often perform better than deterministic ones(More)
Irregular algorithms are organized around pointer-based data structures such as graphs and trees, and they are ubiquitous in applications. Recent work by the Galois project has provided a systematic approach for parallelizing irregular applications based on the idea of optimistic or speculative execution of programs. However, the overhead of optimistic(More)
Graph partitioning is a common and frequent preprocess-ing step in many high-performance parallel applications on distributed-and shared-memory architectures. It is used to distribute graphs across memory and to improve spatial locality. There are several parallel implementations of graph partitioning for distributed-memory architectures. In this paper, we(More)
The problem of priority scheduling arises in many algorithms. There is a dynamic pool of tasks that can be executed in any order, but some execution orders are more efficient than others. To exploit such schedules, each task is associated with an application-specific priority, and a system must schedule tasks to minimize priority inversions and scheduling(More)
Many sparse matrix computations can be speeded up if the matrix is first reordered. Reordering was originally developed for direct methods but it has recently become popular for improving the cache locality of parallel iterative solvers since reordering the matrix to reduce bandwidth and wavefront can improve the locality of reference of sparse(More)