# On the Sublinear Processor Gap for Parallel Architectures

@inproceedings{LpezOrtiz2013OnTS, title={On the Sublinear Processor Gap for Parallel Architectures}, author={Alejandro L{\'o}pez-Ortiz and Alejandro Salinger}, booktitle={TAMC}, year={2013} }

In the past, parallel algorithms were developed, for the most part, under the assumption that the number of processors is Θ(n) (where n is the size of the input) and that if in practice the actual number was smaller, this could be resolved using Brent’s Lemma to simulate the highly parallel solution on a lower-degree parallel architecture. In this paper, however, we argue that design and implementation issues of algorithms and architectures are significantly different—both in theory and in…

## One Citation

### Models for Parallel Computation in Multi-Core, Heterogeneous, and Ultra Wide-Word Architectures

- Computer Science
- 2013

Low-degree-parallelism in computation is explored, providing evidence of fundamental differences in practice and theory between systems with a sublinear and linear number of processors, and suggesting a sharp theoretical gap between the classes of problems that are efficiently parallelizable in each case.

## References

SHOWING 1-10 OF 28 REFERENCES

### A Complexity Theory of Efficient Parallel Algorithms

- Computer ScienceTheor. Comput. Sci.
- 1990

### Fundamental parallel algorithms for private-cache chip multiprocessors

- Computer ScienceSPAA '08
- 2008

This paper presents two sorting algorithms, a distribution sort and a mergesort, and studies sorting lower bounds in a computational model, which is called the parallel external-memory (PEM) model, that formalizes the essential properties of the algorithms for private-cache CMPs.

### Provably good multicore cache performance for divide-and-conquer algorithms

- Computer ScienceSODA '08
- 2008

It is shown that a separator-based algorithm for sparse-matrix-dense-vector-multiply achieves provably good cache performance in the multicore-cache model, as well as in the well-studied sequential cache-oblivious model.

### I/O-Optimal Distribution Sweeping on Private-Cache Chip Multiprocessors

- Computer Science2011 IEEE International Parallel & Distributed Processing Symposium
- 2011

A new one-dimensional batched range counting algorithm on a sorted list of ranges and points that achieves an I/O complexity of $O((N + K)/PB)$, where $K$ is the sum of the counts of all the ranges.

### Provably efficient scheduling for languages with fine-grained parallelism

- Computer ScienceSPAA '95
- 1995

The paper identifies a class of parallel schedules that are provably efficient in both time and space and describes a scheduler for implementing high-level languages with nested parallel- ism, that generates schedules in this class.

### Optimal speedup on a low-degree multi-core parallel architecture (LoPRAM)

- Computer ScienceSPAA '08
- 2008

This paper proposes a model of low degree parallelism (LoPRAM) which builds upon the RAM and PRAM models yet better reflects recent advances in parallel (multi-core) architectures and shows that in many instances it naturally leads to work-optimal parallel algorithms via simple modifications to sequential algorithms.

### Limits to Parallel Computation: P-Completeness Theory

- Computer Science
- 1995

In providing an up-to-date survey of parallel computing research from 1994, Topics in Parallel Computing will prove invaluable to researchers and professionals with an interest in the super computers of the future.

### Geometric Algorithms for Private-Cache Chip Multiprocessors - (Extended Abstract)

- Computer ScienceESA
- 2010

This work shows how to obtain optimal algorithms for interval stabbing counting, 1-D range counting, weighted 2-D dominance counting, and for computing 3-D maxima,2-D lower envelopes, and 2- D convex hulls.

### Parallel external memory graph algorithms

- Computer Science2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS)
- 2010

All the solutions on a P-processor PEM model provide an optimal speedup of Θ(P) in parallel I/O complexity and parallel computation time, compared to the single-processor external memory counterparts.

### Cache-efficient dynamic programming algorithms for multicores

- Computer ScienceSPAA '08
- 2008

This work develops a generic CMP algorithm with an associated tiling sequence and provides a parallel schedule that results in a cache-efficient parallel execution up to the critical path length of the underlying dynamic programming algorithm.