Parallel String Sample Sort

@inproceedings{Bingmann2013ParallelSS,
  title={Parallel String Sample Sort},
  author={Timo Bingmann and Peter Sanders},
  booktitle={ESA},
  year={2013}
}
We discuss how string sorting algorithms can be parallelized on modern multi-core shared memory machines. [...] Key Method The algorithm makes effective use of the memory hierarchy, uses additional word level parallelism, and largely avoids branch mispredictions. Additionally, we parallelize variants of multikey quicksort and radix sort that are also useful in certain situations.Expand
Engineering Parallel String Sorting
TLDR
This work proposes string sample sort, a synthesis of the best sequential string sorting algorithms and successful parallel sorting algorithms for atomic objects, and describes sequential LCP-insertion sort which calculates the LCP array and accelerates its insertions using it. Expand
Communication-Efficient String Sorting
TLDR
These algorithms inspect only characters that are needed to determine the sorting order and communication volume is reduced by also communicating only those characters and by communicating repetitions of the same prefixes only once. Expand
Scalable String and Suffix Sorting: Algorithms, Techniques, and Tools
TLDR
This dissertation focuses on two fundamental sorting problems: string sorting and suffix sorting, and proposes both multiway distribution-based with string sample sort and multiway merge-based string sorting with LCP-aware merge and mergesort, and engineer and parallelize both approaches. Expand
Efficient String Sort with Multi-Character Encoding and Adaptive Sampling
TLDR
This work introduces a novel multi-character encoding based method that can significantly reduce the radix and is competitive or better than the most recent parallel string sorting algorithm pS5 which demonstrates the scalability of the method. Expand
In-place Parallel Super Scalar Samplesort (IPS$^4$o)
We present a sorting algorithm that works in-place, executes in parallel, is cache-efficient, avoids branch-mispredictions, and performs work O(n log n) for arbitrary inputs with high probability.Expand
How Many CPU Cores is an FPGA Worth? Lessons Learned from Accelerating String Sorting on a CPU-FPGA System
TLDR
The pHS5 extends pS5, the state-of-the-art string sorting algorithm for multi-core shared memory CPUs, by adding multiple processing elements (PEs) on the FPGA, and extended the job scheduling mechanism of pS4 to schedule the accelerable kernel not only among available CPU cores but also on the authors' PEs, while retaining the complex high-level control flow and the sorting of the smaller data sets on the CPU. Expand
Analysis of Parallel and Sequential Radix Sort for Graph Exploration using OpenMP and CUDA: A Review
In this paper we have analyzed the comparison of radix sort algorithm on sequential and parallel procedures across three programming language platforms namely C, OpenMP based C++ and CUDA programmingExpand
FPGAs in the Datacenters: the Case of Parallel Hybrid Super Scalar String Sample Sort
TLDR
This paper presents Parallel Hybrid Super Scalar String Sample Sort (pHS5) on Intel HARPv2, a heterogeneous CPU-FPGA system with a server-grade multi-core CPU and extended the job scheduling mechanism of pS5 to enable the PEs to compete with the CPU cores for processing the accelerable kernel. Expand
Engineering faster sorters for small sets of items
TLDR
The results clearly show the potential of using conditional moves in the field of sorting algorithms, as when sorting only small sets of integers, the sorting networks outperform insertion sort. Expand
Performance Improvement for Multi-Key Quick Sort using Kepler GPUs
This paper presents the performance improvement obtained by multi key quick sort on Kepler NVIDIA GPUs. Since Multi key quick sort is a recursion based algorithm many of the researchers have found itExpand
...
1
2
...

References

SHOWING 1-10 OF 28 REFERENCES
Cache Efficient Radix Sort for String Sorting
TLDR
CRadix sort causes fewer cache misses than MSD radix sort by uniquely associating a small block of main memory called the key buffer to each key and temporarily storing a portion of each key into its corresponding key buffer. Expand
Super Scalar Sample Sort
TLDR
The main algorithmic insight is that element comparisons can be decoupled from expensive conditional branching using predicated instructions, which facilitates optimizations like loop unrolling and software pipelining. Expand
A simple, fast parallel implementation of Quicksort and its performance evaluation on SUN Enterprise 10000
  • P. Tsigas, Y. Zhang
  • Computer Science
  • Eleventh Euromicro Conference on Parallel, Distributed and Network-Based Processing, 2003. Proceedings.
  • 2003
TLDR
This work has implemented sample sort and a parallel version of Quicksort on a cache-coherent shared address space multiprocessor: the SUN ENTERPRISE 10000, and shows that parallel quicksort outperforms sample sort. Expand
Engineering Radix Sort for Strings
TLDR
These implementations are significantly faster than previous MSD radix sort implementations, and in fact faster than any other string sorting algorithm on several data sets, and a new variant achieves high space-efficiency at a small additional cost on runtime. Expand
Cache-conscious sorting of large sets of strings with dynamic tries
TLDR
This work proposes a new sorting algorithm for strings, burstsort, based on dynamic construction of a compact trie in which strings are kept in buckets, which is simple, fast, and efficient. Expand
Engineering Radix Sort
TLDR
Three ways to sort strings by bytes left to right-a stable list sort, a stable two-array sort, and an in-place "American flag" sor¿-are illustrated with practical C programs, and all three perform comparably, usually running at least twice as fast as a good quicksort. Expand
Cache-efficient string sorting using copying
TLDR
C-burstsort is introduced, which copies the unexamined tail of each key to the bucket and discards the original key to improve data locality and show that sorting is typically twice as fast as the original burstsort and four to five times faster than multikey quicksort and previous radixsorts. Expand
Engineering a Multi-core Radix Sort
We present a fast radix sorting algorithm that builds upon a microarchitecture-aware variant of counting sort. Taking advantage of virtual memory and making use of write-combining yields a per-passExpand
Fast algorithms for sorting and searching strings
TLDR
This work presents theoretical algorithms for sorting and searching multikey data, and derive from them practical C implementations for applications in which keys are character strings, and presents extensions to more complex string problems, such as partial-match searching. Expand
A comparison of sorting algorithms for the connection machine CM-2
TLDR
A fast sorting algorithm for the Connection Machine Supercomputer model CM-2 is developed and it is shown that any U(lg n)-depth family of sorting networks can be used to sort n numbers in U( lg n) time in the bounded-degree fixed interconnection network domain. Expand
...
1
2
3
...