Sorting Large Data Files on POOMA

  title={Sorting Large Data Files on POOMA},
  author={Bj{\o}rn Arild W. Baugst{\o} and Jarle Fredrik Greipsland and J. Kamerbeek},
This paper reports on the results of the porting of a typical benchmark problem for distributed systems from its original platform [Baug89b] (the HC16 database machine 1) to POOMA, the Parallel Object Oriented MAchine 2. The benchmark, a sorting algorithm for a data set distributed over (sequential accessible) background devices of a number of processing nodes, was implemented on POOMA at two levels. The first level is the nucleus of the POOMA operating system and the implementation is written… 
The POOMA Architecture
An overview of the POOMA hardware architecture and its prototype implementation, as developed within the machine subproject of the PRISMA (PaRallel Inference and Storage MAchine) project, confirming the potential of the hardware architecture as such.
A super scalar sort algorithm for RISC processors
New sort algorithms which eliminate almost all the compares, provide functional parallelism which can be exploited by multiple execution units, significantly reduce the number of passes through keys, and improve data locality are developed.
Relational Algebra Operations
A set of relational algebra operations are described and slightly enhanced for improvediciency and the results of the DeWitt join test runs are given.
High-performance sorting on networks of workstations
We report the performance of NOW-Sort, a collection of sorting implementations on a Network of Workstations (NOW). We find that parallel sorting on a NOW is competitive to sorting on the large-scale
GPUTeraSort: high performance graphics co-processor sorting for large database management
Overall, the results indicate that using a GPU as a co-processor can significantly improve the performance of sorting algorithms on large databases.
Communication conscious radix sort
A reorganization of Radix sort is proposed that leads to a highly local version of the algorithm at a very low cost and achieves a good load balance which makes it insensitive to skewed data distributions.
The new challenges are to develop fault tolerant systems and database servers for “new” data types, notably film and video.
Alphasort: A cache-sensitive parallel external sort
A new sort algorithm, called AlphaSort, demonstrates that commodity processors and disks can handle commercial batch workloads and argues that modern architectures require algorithm designers to re-examine their use of the memory hierarchy.
AlphaSort: a RISC machine sort
A new sort algorithm, called AlphaSort, demonstrates that commodity processors and disks can handle commercial batch workloads and proposes two new benchmarks: Minutesort: how much can you sort in a minute, and DollarSort: how to sort for a dollar.
Efficient bundle sorting
An efficient algorithm for bundle sorting in external memory, which requires at most c(N/B) logM/Bk disk accesses, and is shown to be optimal by proving a matching lower bound for bundling together identical keys.


A Low Communication Sort Algorithm for a Parallel Database Machine
This work proposes a novel algorithm that exhibits complete parallelism during the sort, merge, and return-tohost phases, and decreases the amou@ of inter-processor communication compared to existing parallel sort algorithms.
Algebra Operations on a Parallel Computer - Performance Evaluation
The design of a parallel database computer that contains 8 single board computers that communicate over a system of shared RAM, allowing fast communication without interference, and test results are reported.
Sorting Large Files on a Backend Multiprocessor
The results show that using current, off-the-shelf technology coupled with a streamlined distributed operating system, three- and five-microprocessor configurations, provide a very cost-effective sort of large files.
Parallel Partition Sort for Database Machines
A new parallel sorting method, called a parallel partition sort, which transfers only a small amount of data and does not place large demands on the CPU is discussed, based on the top-down partitioning of data.
Parallel Sorting Methods for Large Data Volumes on a Hypercube Database Computer
Two external sorting algorithms for hypercube database computers are presented based on partitioning of data according to partition values obtained through sampling of the data.
Multiprocessor Hash-Based Join Algorithms
It is demonstrated that bit vector filtering provides dramatic improvement in the performance of all algorithms including the sort mergejoin algorithm, and is shown to provide linear increases in throughput with corresponding increases in processor and disk resources.
A deadlock free and starvation free network of packet switching communication processors
A network of communication processors is proved to be free of deadlock and starvation: it is proved that there is guaranteed progress for every packet in the network.
Sorting and Searching
The first revision of this third volume is a survey of classical computer techniques for sorting and searching. It extends the treatment of data structures in Volume 1 to consider both large and
Evaluation of 18-stage Pipeline Hardware Sorter
Since the sorting is one of the most fundamental and frequently used operation in the current computer system, we have so far developed a high speed hardware sorter.