Communication efficient algorithms for fundamental big data problems

@article{Sanders2013CommunicationEA,
  title={Communication efficient algorithms for fundamental big data problems},
  author={Peter Sanders and Sebastian Schlag and Ingo M{\"u}ller},
  journal={2013 IEEE International Conference on Big Data},
  year={2013},
  pages={15-23}
}
Big Data applications often store or obtain their data distributed over many computers connected by a network. Since the network is usually slower than the local memory of the machines, it is crucial to process the data in such a way that not too much communication takes place. Indeed, only communication volume sublinear in the input size may be affordable. We believe that this direction of research deserves more intensive study. We give examples for several fundamental algorithmic problems… 

Figures from this paper

Communication Efficient Algorithms for Distributed OLAP Query Execution
TLDR
A technique to find a better partitioning of the tables in a database to allow the execution of joins without communication effort, and an algorithm that selects the first k tuples of the result set of a query with a communication effort independent from the size of the database.
Practical Massively Parallel Sorting
TLDR
The algorithms are multi-level generalizations of the known algorithms sample sort and multiway mergesort, which turns out to be very scalable both in theory and practice where it scales up to 215 MPI processes with outstanding performance in particular for medium sized inputs.
Communication Efficient Algorithms for Top-k Selection Problems
We present scalable parallel algorithms with sublinear per-processor communication volume and low latency for several fundamental problems related to finding the most relevant elements in a set, for
Communication Efficient Checking of Big Data Operations
TLDR
These checkers cover many of the commonly used operations, including sum, average, median, and minimum aggregation, as well as sorting, union, merge, and zip, to check the correctness of operations in Big Data processing frameworks and distributed databases.
Parallel Weighted Random Sampling
TLDR
This work gives efficient, fast, and practicable algorithms for sampling single items, $k$ items with/without replacement, permutations, subsets, and reservoirs, and improved sequential algorithms for alias table construction and for sampling with replacement.
Bloom Filters for ReduceBy, GroupBy and Join in Thrill
TLDR
An augmented version of the detection algorithm, which detects the worker with the highest number of total occurences for each key, which is determined as the shuffle target for that key in the Reduce operation.
Communication-Efficient String Sorting
TLDR
These algorithms inspect only characters that are needed to determine the sorting order and communication volume is reduced by also communicating only those characters and by communicating repetitions of the same prefixes only once.
Efficient Parallel Random Sampling—Vectorized, Cache-Efficient, and Online
TLDR
A simple divide-and-conquer scheme is proposed that makes sequential algorithms more cache efficient and leads to a parallel algorithm running in expected time O(n/p+log p) on p processors, i.e., scales to massively parallel machines even for moderate values of n.
Robust Massively Parallel Sorting
TLDR
This work investigates distributed memory parallel sorting algorithms that scale to the largest available machines and are robust with respect to input size and distribution of the input elements and designs a new variant of quicksort with fast high-quality pivot selection.
Connecting MapReduce Computations to Realistic Machine Models
  • P. Sanders
  • Computer Science
    2020 IEEE International Conference on Big Data (Big Data)
  • 2020
This paper explains how the popular, highly abstract MapReduce model of parallel computation (MRC/MPC) can be rooted in reality by showing how to execute MapReduce computations robustly and
...
1
2
3
4
...

References

SHOWING 1-10 OF 28 REFERENCES
Improving distributed join efficiency with extended bloom filter operations
TLDR
This paper presents extensions of bloom filter operations that are applicable to a wide range of usages, where bloom filters are facilitated for compressed set representation, and points out how they improve the performance of such distributed joins.
Fundamental parallel algorithms for private-cache chip multiprocessors
TLDR
This paper presents two sorting algorithms, a distribution sort and a mergesort, and studies sorting lower bounds in a computational model, which is called the parallel external-memory (PEM) model, that formalizes the essential properties of the algorithms for private-cache CMPs.
One is enough: distributed filtering for duplicate elimination
TLDR
A suite of distributed Bloom filters that exploit different ways of partitioning the event space to address the continuous nature of event delivery and are extended to support sliding window semantics.
Communication lower bounds and optimal algorithms for programs that reference arrays - Part 1
TLDR
This work generalizes the lower bound approach used initially for Theta(N3) matrix multiplication to a much larger class of algorithms, that may have arbitrary numbers of loops and arrays with arbitrary dimensions as long as the index expressions are a ne combinations of loop variables.
Distributed Duplicate Removal
The distributed duplicate removal problem is concerned with the detection and subsequent elimination of all duplicate elements in a given multiset that is distributed over several computers connected
Data streams: algorithms and applications
TLDR
Data Streams: Algorithms and Applications surveys the emerging area of algorithms for processing data streams and associated applications, which rely on metric embeddings, pseudo-random computations, sparse approximation theory and communication complexity.
Theory and Practice of Bloom Filters for Distributed Systems
TLDR
An overview of the basic and advanced probabilistic techniques is given, reviewing over 20 variants and discussing their application in distributed systems, in particular for caching, peer-to-peer systems, routing and forwarding, and measurement data summarization.
Efficient Parallel Graph Algorithms for Coarse-Grained Multicomputers and BSP
TLDR
The algorithms for Problems (1)—(7) are the first practically relevant parallel algorithms for these standard graph problems, and the number of communication rounds/ supersteps obtained in this paper is independent of the problem size, and grows only logarithmically with respect to p.
Efficient Parallel Graph Algorithms For Coarse Grained Multicomputers and BSP
TLDR
The algorithms presented are the first practically relevant deterministic parallel algorithms for these problems to be used for commercially available coarse grained parallel machines and view as an important step towards the final goal of O(1) communication rounds.
Fast, Small, Simple Rank/Select on Bitmaps
TLDR
This paper presents two structures, one using the bitmap in plain form and another using a compressed form, that are simple to implement and combine much lower space overheads than previous work with excellent time performance for rank and select queries.
...
1
2
3
...