# Communication conscious radix sort

@inproceedings{JimnezGonzlez1999CommunicationCR, title={Communication conscious radix sort}, author={Daniel Jim{\'e}nez-Gonz{\'a}lez and Josep-Llu{\'i}s Larriba-Pey and Juan J. Navarro}, booktitle={ICS '99}, year={1999} }

The exploitation of data locality in parallel computers is paramount to reduce the memory traffic and communication among processing nodes. We focus on the exploitation of locality by Parallel Radix sort. The original Parallel Radix sort has several communication steps in which one sorting key may have to visit several processing nodes. In response to this, we propose a reorganization of Radix sort that leads to a highly local version of the algorithm at a very low cost. As a key feature, our…

## 11 Citations

The effect of local sort on parallel sorting algorithms

- Computer ScienceProceedings 10th Euromicro Workshop on Parallel, Distributed and Network-based Processing
- 2002

There are three important contributions in SCS-Radix sort: first, the work saved by detecting data skew dynamically; second, the exploitation of the memory hierarchy done by the algorithm; and third, the execution time stability of SCS -Radix when sorting data sets with different characteristics.

Fast parallel in-memory 64-bit sorting

- Computer ScienceICS '01
- 2001

A new algorithm that is more than 2 times faster than the previous fastest 64-bit parallel sorting algorithm, PCS-Radix sort, which adapts to any parallel computer by changing three simple algorithmic parameters.

CC-Radix: a cache conscious sorting based on Radix sort

- Computer ScienceEleventh Euromicro Conference on Parallel, Distributed and Network-Based Processing, 2003. Proceedings.
- 2003

CC-Radix improves the data locality by dynamically partitioning the data set into subsets that fit in cache level L/sub 2/.

Improving Communication Sensitive Parallel Radix Sort for Unbalanced Data

- Computer ScienceEuro-Par
- 2003

An efficient improvement is presented which helps to overcome the problems with unbalanced data characteristics and is tested practically on a Linux-based SMP cluster.

Sorting on the SGI Origin 2000: comparing MPI and shared memory implementations

- Computer ScienceProceedings. SCCC'99 XIX International Conference of the Chilean Computer Science Society
- 1999

This paperalyses the C/sup 3/-Radix (Communication- and Cache-Conscious Radix) sort algorithm, using the distributed and the shared memory parallel programming models, and explains the reasons for the different behaviours depending on the size of the data sets.

SIMD- and Cache-Friendly Algorithm for Sorting an Array of Structures

- Computer ScienceProc. VLDB Endow.
- 2015

This paper describes a new algorithm for sorting an array of structures by efficiently exploiting the SIMD instructions and cache memory of today's processors based on multiway mergesort, and shows that this approach exhibited up to 2.1x better single-thread performance than the key-index approach implemented withSIMD instructions when sorting 512M 16-byte records on one core.

Automatic generation of a parallel sorting algorithm

- Computer Science2008 IEEE International Symposium on Parallel and Distributed Processing
- 2008

Preliminary experimental results show that the automatic generation of a distributed memory parallel sorting routine provides up to a four fold improvement over standard parallel algorithms with typical parameters.

Designing parallel algorithms for SMP clusters

- Computer Science
- 2003

Methods for designing and optimizing parallel algorithms for SMP clusters, which combines two different concepts, show an alternative way that shows how to adapt the algorithms to the hierarchical environment.

How SIMD width affects energy efficiency: A case study on sorting

- Computer Science2016 IEEE Symposium in Low-Power and High-Speed Chips (COOL CHIPS XIX)
- 2016

The results show that SIMD can reduce power in addition to enhancing the performance, especially when the memory bandwidth is not sufficient to fully drive the cores.

## References

SHOWING 1-10 OF 43 REFERENCES

Fast parallel in-memory 64-bit sorting

- Computer ScienceICS '01
- 2001

A new algorithm that is more than 2 times faster than the previous fastest 64-bit parallel sorting algorithm, PCS-Radix sort, which adapts to any parallel computer by changing three simple algorithmic parameters.

Load balanced parallel radix sort

- Computer ScienceICS '98
- 1998

Experimental results indicate that balanced radix sort can sort OSG integers in 20 seconds and 128M doubles in 15 seconds on a 64-processor SPZWN while yielding over 40-fold speedup.

Sorting on the SGI Origin 2000: comparing MPI and shared memory implementations

- Computer ScienceProceedings. SCCC'99 XIX International Conference of the Chilean Computer Science Society
- 1999

This paperalyses the C/sup 3/-Radix (Communication- and Cache-Conscious Radix) sort algorithm, using the distributed and the shared memory parallel programming models, and explains the reasons for the different behaviours depending on the size of the data sets.

Adapting Radix Sort to the Memory Hierarchy

- Computer ScienceJEAL
- 2001

The importance of reducing misses in the translation-lookaside buffer (TLB) for obtaining good performance on modern computer architectures is demonstrated and three techniques which simultaneously reduce cache and TLB misses for LSB radix sort are given: reducing working set size, explicit block transfer and pre-sorting.

A Benchmark Parallel Sort for Shared Memory Multiprocessors

- Computer ScienceIEEE Trans. Computers
- 1988

The first parallel sort algorithm for shared memory MIMD (multiple-instruction-multiple-data-stream) multiprocessors that has a theoretical and measured speedup near linear is exhibited. It is based…

A super scalar sort algorithm for RISC processors

- Computer ScienceSIGMOD '96
- 1996

New sort algorithms which eliminate almost all the compares, provide functional parallelism which can be exploited by multiple execution units, significantly reduce the number of passes through keys, and improve data locality are developed.

An analysis of superscalar sorting algorithms on an R8000 processor

- Computer ScienceProceedings 17th International Conference of the Chilean Computer Science Society
- 1997

It is possible to understand that Radix sort is the most promising of the methods studied here for future superscalar architectures and the use of combined methods does not help to exploit locality.

Parallel algorithms for personalized communication and sorting with an experimental study (extended abstract)

- Computer ScienceSPAA '96
- 1996

A novel variation on sample sort which uses only two rounds of regular all-to-all personalized communication in a scheme that yields very good load balancing with virtually no overhead and performance is invariant over the set of input distributions unlike previous efficient algorithms.

The Block Distributed Memory Model

- Computer ScienceIEEE Trans. Parallel Distributed Syst.
- 1996

This work introduces a computation model for developing and analyzing parallel algorithms on distributed memory machines and shows that most of these algorithms achieve optimal or near optimal communication complexity while simultaneously guaranteeing an optimal speed-up in computational complexity.

Design, analysis, and implementation of parallel external sorting algorithms

- Computer Science
- 1981

A modified merge-sort is proposed to use as a method for eliminating duplicate records in a large file and a combinatorial model is developed to provide an accurate estimate for the cost of the duplicate elimination operation (both in the serial and the parallel cases).