#### Filter Results:

#### Publication Year

1991

2016

#### Co-author

#### Key Phrase

#### Publication Venue

Learn More

Achieving an even load balance with a low communication overhead is a fundamental task in parallel computing. In this paper we consider the problem of partitioning an array into a number of blocks such that the maximum amount of work in any block is as low as possible. We review diierent proposed schemes for this problem and the complexity of their… (More)

The problem of partitioning a sequence of n real numbers into p intervals is considered. The goal is to nd a partition such that the cost of the most expensive interval measured with a cost function f is minimized. An eecient algorithm which solves the problem in time O(p(n ? p) log p) is developed. The algorithm is based on nding a sequence of feasible… (More)

We introduce the class of skew-circulant lattice rules. These are s-dimensional lattice rules that may be generated by the rows of an s × s skew-circulant matrix. (This is a minor variant of the familiar circulant matrix.) We present briefly some of the underlying theory of these matrices and rules. We are particularly interested in finding rules of… (More)

Many problems have multiple layers of parallelism. The outer-level may consist of few and coarse-grained tasks. Next, each of these tasks may also be rich in parallelism, and be split into a number of fine-grained tasks, which again may consist of even finer subtasks, and so on. Here we argue and demonstrate by examples that utilizing multiple layers of… (More)

In this paper we discuss the use of nested parallelism. Our claim is that if the problem naturally possesses multiple levels of parallelism, then applying parallelism to all levels may significantly enhance the scalability of your algorithm. This claim is sustained by numerical experiments. We also discuss how to implement multi-level parallelism using… (More)

This paper describe the implementation and underlying philoso-phie of a large scale distributed computation of K-optimal lattice rules. The computation is huge corresponding to the equivalent of 36 years computation on a single workstation. In this paper we describe our implementation , how we have built in fault tolerence and our strategy for… (More)

In this paper we describe how to apply ne grain parallelism to augmenting path algorithms for the dense linear assignment problem. We prove by doing that the technique we suggest, can be eeciently implemented on commercial available, massively parallel computers. Using n processors, our method reduces the computational complexity from the sequential O(n 3)… (More)

A major search program is described that has been used to determine a set of five-dimensional K-optimal lattice rules of enhanced trigono-metric degrees up to 12. The program involved a distributed search, in which approximately 190 CPU-years were shared between more than 1,400 computers in many parts of the world.

Applications are increasingly being executed on computational systems that have hierarchical parallelism. There are several programming paradigms which may be used to adapt a program for execution in such an environment. In this paper, we outline some of the challenges in porting codes to such systems, and describe a programming environment that we are… (More)