# Asymptotic Optimality of Parallel Short Division

@article{Emmart2016AsymptoticOO, title={Asymptotic Optimality of Parallel Short Division}, author={Niall Emmart and C. Weems}, journal={2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS)}, year={2016}, pages={864-872} }

In 2011 we published a practical algorithm for short division (division of a multiple precision dividend by a single precision divisor) on a parallel processor (HiPC 2011) with a run time of O(n/p+log p). Our algorithm, based on parallel computation of remainder sequences, is an improvement of Takahashi's earlier work (LSSC 2007) which has a run time of O((n/p) log p). Here we prove that Omega(n/p+log p) is a tight lower bound for short division (using a conventional fixed radix number system… Expand

#### 2 Citations

Review of Basic Classes of Dividers Based on Division Algorithm

- Computer Science
- IEEE Access
- 2021

The broad classification of dividers into basic classes named digit recurrence, high radix, functional iteration, estimation, a look-up table, and variable latency is described, which illustrates that, in practical implementation, many algorithms have been developed that combine one or many classes and are implemented with different hardware architectures. Expand

A Study of High Performance Multiple Precision Arithmetic on Graphics Processing Units

- Computer Science
- 2018

A study of the impact of multi-modal decision analysis on graphics processing units and how it affects performance and efficiency is published. Expand

#### References

SHOWING 1-10 OF 37 REFERENCES

Parallel multiple precision division by a single precision divisor

- Computer Science
- 2011 18th International Conference on High Performance Computing
- 2011

This work combines a parallel version of Jebelean's exact division algorithm with a left-to-right algorithm for computing the borrow chain, to relax the requirement of exactness, and employs Takahashi's recently reported cyclic reduction technique for GPU division to further enhance performance. Expand

An Algorithm for Exact Division

- Computer Science, Mathematics
- J. Symb. Comput.
- 1993

An algorithm which computes the quotient of two long integers in this particular situation, starting from the least-significant digits of the operands, which is better suited for systolic parallelization in a "least-significant digit first" pipelined manner. Expand

Fast recursive division

- Computer Science
- 1998

A new recursive method for division with remainder of integers is presented and practical results of an implementation allow us to say that the authors have the fastest integer division on a SPARC architecture compared to all other integer packages they know of. Expand

On Parallel Prefix Computation

- Computer Science
- Parallel Process. Lett.
- 1994

We prove that prefix sums of n integers of at most b bits can be found on a COMMON CRCW PRAM in time with a linear time-processor product. The algorithm is optimally fast, for any polynomial number… Expand

Modular exponentiation via the explicit Chinese remainder theorem

- Computer Science, Mathematics
- Math. Comput.
- 2007

A new result on the parallel complexity of modular exponentiation is obtained: there is an algorithm for the Common CRCW PRAM that, given positive integers x, e, and m in binary, of total bit length n, computes x e mod m in time O(n/lglgn) using n O(1) processors. Expand

Upper and Lower Time Bounds for Parallel Random Access Machines without Simultaneous Writes

- Computer Science
- SIAM J. Comput.
- 1986

It is shown that even if the authors allow nonuniform algorithms, an arbitrary number of processors, and arbitrary instruction sets, $\Omega (\log n)$ is a lower bound on the time required to compute various simple functions, including sorting n keys and finding the logical “or” of n bits. Expand

A randomized sublinear time parallel GCD algorithm for the EREW PRAM

- Mathematics, Computer Science
- Inf. Process. Lett.
- 2010

Abstract We present a randomized parallel algorithm that computes the greatest common divisor of two integers of n bits in length with probability 1 − o ( 1 ) that takes O ( n log log n / log n )… Expand

Improved Upper and Lower Time Bounds for Parallel Random Access Machines Without Simultaneous Writes

- Computer Science
- SIAM J. Comput.
- 1991

The time required by a variant of the PRAM to compute a certain class of functions called critical functions (which include the Boolean OR of n bits) is studied and it is shown that any PRAM which computes a critical function must take at least $0.5log n - O(1) steps. Expand

Bidirectional Exact Integer Division

- Computer Science, Mathematics
- J. Symb. Comput.
- 1996

It is shown that the high- order part and the low-order part of the exact quotient can be computed independently from each other. Expand

Parallel Algorithms for Shared-Memory Machines

- Computer Science
- Handbook of Theoretical Computer Science, Volume A: Algorithms and Complexity
- 1990

This chapter discusses parallel algorithms for shared-memory machines, which focus on the technological limits of today's chips, in which gates and wires are packed into a small number of planar layers. Expand