On the Sublinear Processor Gap for Parallel Architectures

  title={On the Sublinear Processor Gap for Parallel Architectures},
  author={Alejandro L{\'o}pez-Ortiz and Alejandro Salinger},
In the past, parallel algorithms were developed, for the most part, under the assumption that the number of processors is Θ(n) (where n is the size of the input) and that if in practice the actual number was smaller, this could be resolved using Brent’s Lemma to simulate the highly parallel solution on a lower-degree parallel architecture. In this paper, however, we argue that design and implementation issues of algorithms and architectures are significantly different—both in theory and in… 
1 Citations

Models for Parallel Computation in Multi-Core, Heterogeneous, and Ultra Wide-Word Architectures

Low-degree-parallelism in computation is explored, providing evidence of fundamental differences in practice and theory between systems with a sublinear and linear number of processors, and suggesting a sharp theoretical gap between the classes of problems that are efficiently parallelizable in each case.



A Complexity Theory of Efficient Parallel Algorithms

Fundamental parallel algorithms for private-cache chip multiprocessors

This paper presents two sorting algorithms, a distribution sort and a mergesort, and studies sorting lower bounds in a computational model, which is called the parallel external-memory (PEM) model, that formalizes the essential properties of the algorithms for private-cache CMPs.

Provably good multicore cache performance for divide-and-conquer algorithms

It is shown that a separator-based algorithm for sparse-matrix-dense-vector-multiply achieves provably good cache performance in the multicore-cache model, as well as in the well-studied sequential cache-oblivious model.

I/O-Optimal Distribution Sweeping on Private-Cache Chip Multiprocessors

A new one-dimensional batched range counting algorithm on a sorted list of ranges and points that achieves an I/O complexity of $O((N + K)/PB)$, where $K$ is the sum of the counts of all the ranges.

Provably efficient scheduling for languages with fine-grained parallelism

The paper identifies a class of parallel schedules that are provably efficient in both time and space and describes a scheduler for implementing high-level languages with nested parallel- ism, that generates schedules in this class.

Optimal speedup on a low-degree multi-core parallel architecture (LoPRAM)

This paper proposes a model of low degree parallelism (LoPRAM) which builds upon the RAM and PRAM models yet better reflects recent advances in parallel (multi-core) architectures and shows that in many instances it naturally leads to work-optimal parallel algorithms via simple modifications to sequential algorithms.

Limits to Parallel Computation: P-Completeness Theory

In providing an up-to-date survey of parallel computing research from 1994, Topics in Parallel Computing will prove invaluable to researchers and professionals with an interest in the super computers of the future.

Geometric Algorithms for Private-Cache Chip Multiprocessors - (Extended Abstract)

This work shows how to obtain optimal algorithms for interval stabbing counting, 1-D range counting, weighted 2-D dominance counting, and for computing 3-D maxima,2-D lower envelopes, and 2- D convex hulls.

Parallel external memory graph algorithms

All the solutions on a P-processor PEM model provide an optimal speedup of Θ(P) in parallel I/O complexity and parallel computation time, compared to the single-processor external memory counterparts.

Cache-efficient dynamic programming algorithms for multicores

This work develops a generic CMP algorithm with an associated tiling sequence and provides a parallel schedule that results in a cache-efficient parallel execution up to the critical path length of the underlying dynamic programming algorithm.