# Parallel String Sample Sort

@inproceedings{Bingmann2013ParallelSS, title={Parallel String Sample Sort}, author={Timo Bingmann and Peter Sanders}, booktitle={ESA}, year={2013} }

We discuss how string sorting algorithms can be parallelized on modern multi-core shared memory machines. [...] Key Method The algorithm makes effective use of the memory hierarchy, uses additional word level parallelism, and largely avoids branch mispredictions. Additionally, we parallelize variants of multikey quicksort and radix sort that are also useful in certain situations. Expand

## Figures, Tables, and Topics from this paper

## 16 Citations

Engineering Parallel String Sorting

- Computer ScienceAlgorithmica
- 2015

This work proposes string sample sort, a synthesis of the best sequential string sorting algorithms and successful parallel sorting algorithms for atomic objects, and describes sequential LCP-insertion sort which calculates the LCP array and accelerates its insertions using it.

Communication-Efficient String Sorting

- Computer Science2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS)
- 2020

These algorithms inspect only characters that are needed to determine the sorting order and communication volume is reduced by also communicating only those characters and by communicating repetitions of the same prefixes only once.

Scalable String and Suffix Sorting: Algorithms, Techniques, and Tools

- Computer Science, MathematicsArXiv
- 2018

This dissertation focuses on two fundamental sorting problems: string sorting and suffix sorting, and proposes both multiway distribution-based with string sample sort and multiway merge-based string sorting with LCP-aware merge and mergesort, and engineer and parallelize both approaches.

Efficient String Sort with Multi-Character Encoding and Adaptive Sampling

- Computer ScienceSIGMOD Conference
- 2021

This work introduces a novel multi-character encoding based method that can significantly reduce the radix and is competitive or better than the most recent parallel string sorting algorithm pS5 which demonstrates the scalability of the method.

In-place Parallel Super Scalar Samplesort (IPS$^4$o)

- Computer Science
- 2017

We present a sorting algorithm that works in-place, executes in parallel, is cache-efficient, avoids branch-mispredictions, and performs work O(n log n) for arbitrary inputs with high probability.…

How Many CPU Cores is an FPGA Worth? Lessons Learned from Accelerating String Sorting on a CPU-FPGA System

- Computer ScienceJ. Signal Process. Syst.
- 2021

The pHS5 extends pS5, the state-of-the-art string sorting algorithm for multi-core shared memory CPUs, by adding multiple processing elements (PEs) on the FPGA, and extended the job scheduling mechanism of pS4 to schedule the accelerable kernel not only among available CPU cores but also on the authors' PEs, while retaining the complex high-level control flow and the sorting of the smaller data sets on the CPU.

Analysis of Parallel and Sequential Radix Sort for Graph Exploration using OpenMP and CUDA: A Review

- 2016

In this paper we have analyzed the comparison of radix sort algorithm on sequential and parallel procedures across three programming language platforms namely C, OpenMP based C++ and CUDA programming…

FPGAs in the Datacenters: the Case of Parallel Hybrid Super Scalar String Sample Sort

- Computer Science2020 IEEE 31st International Conference on Application-specific Systems, Architectures and Processors (ASAP)
- 2020

This paper presents Parallel Hybrid Super Scalar String Sample Sort (pHS5) on Intel HARPv2, a heterogeneous CPU-FPGA system with a server-grade multi-core CPU and extended the job scheduling mechanism of pS5 to enable the PEs to compete with the CPU cores for processing the accelerable kernel.

Engineering faster sorters for small sets of items

- Computer ScienceSoftw. Pract. Exp.
- 2021

The results clearly show the potential of using conditional moves in the field of sorting algorithms, as when sorting only small sets of integers, the sorting networks outperform insertion sort.

Performance Improvement for Multi-Key Quick Sort using Kepler GPUs

- 2015

This paper presents the performance improvement obtained by multi key quick sort on Kepler NVIDIA GPUs. Since Multi key quick sort is a recursion based algorithm many of the researchers have found it…

## References

SHOWING 1-10 OF 28 REFERENCES

Cache Efficient Radix Sort for String Sorting

- Computer ScienceIEICE Trans. Fundam. Electron. Commun. Comput. Sci.
- 2007

CRadix sort causes fewer cache misses than MSD radix sort by uniquely associating a small block of main memory called the key buffer to each key and temporarily storing a portion of each key into its corresponding key buffer.

Super Scalar Sample Sort

- Physics, Computer ScienceESA
- 2004

The main algorithmic insight is that element comparisons can be decoupled from expensive conditional branching using predicated instructions, which facilitates optimizations like loop unrolling and software pipelining.

A simple, fast parallel implementation of Quicksort and its performance evaluation on SUN Enterprise 10000

- Computer ScienceEleventh Euromicro Conference on Parallel, Distributed and Network-Based Processing, 2003. Proceedings.
- 2003

This work has implemented sample sort and a parallel version of Quicksort on a cache-coherent shared address space multiprocessor: the SUN ENTERPRISE 10000, and shows that parallel quicksort outperforms sample sort.

Engineering Radix Sort for Strings

- Computer ScienceSPIRE
- 2008

These implementations are significantly faster than previous MSD radix sort implementations, and in fact faster than any other string sorting algorithm on several data sets, and a new variant achieves high space-efficiency at a small additional cost on runtime.

Cache-conscious sorting of large sets of strings with dynamic tries

- Computer ScienceJEAL
- 2004

This work proposes a new sorting algorithm for strings, burstsort, based on dynamic construction of a compact trie in which strings are kept in buckets, which is simple, fast, and efficient.

Engineering Radix Sort

- Computer ScienceComput. Syst.
- 1993

Three ways to sort strings by bytes left to right-a stable list sort, a stable two-array sort, and an in-place "American flag" sor¿-are illustrated with practical C programs, and all three perform comparably, usually running at least twice as fast as a good quicksort.

Cache-efficient string sorting using copying

- Computer ScienceACM J. Exp. Algorithmics
- 2006

C-burstsort is introduced, which copies the unexamined tail of each key to the bucket and discards the original key to improve data locality and show that sorting is typically twice as fast as the original burstsort and four to five times faster than multikey quicksort and previous radixsorts.

Engineering a Multi-core Radix Sort

- Computer ScienceEuro-Par
- 2011

We present a fast radix sorting algorithm that builds upon a microarchitecture-aware variant of counting sort. Taking advantage of virtual memory and making use of write-combining yields a per-pass…

Fast algorithms for sorting and searching strings

- Computer ScienceSODA '97
- 1997

This work presents theoretical algorithms for sorting and searching multikey data, and derive from them practical C implementations for applications in which keys are character strings, and presents extensions to more complex string problems, such as partial-match searching.

A comparison of sorting algorithms for the connection machine CM-2

- Computer ScienceSPAA '91
- 1991

A fast sorting algorithm for the Connection Machine Supercomputer model CM-2 is developed and it is shown that any U(lg n)-depth family of sorting networks can be used to sort n numbers in U( lg n) time in the bounded-degree fixed interconnection network domain.