Sebastian Schlag

Learn More
We present the design and a first performance evaluation of Thrill – a prototype of a general purpose big data processing framework with a convenient data-flow style programming interface. Thrill is somewhat similar to Apache Spark and Apache Flink with at least two main differences. First, Thrill is based on C++ which enables performance advantages due to(More)
Big Data applications often store or obtain their data distributed over many computers connected by a network. Since the network is usually slower than the local memory of the machines, it is crucial to process the data in such a way that not too much communication takes place. Indeed, only communication volume sublinear in the input size may be affordable.(More)
We develop a multilevel algorithm for hypergraph partitioning that contracts the vertices one at a time. Using several caching and lazy-evaluation techniques during coarsening and refinement, we reduce the running time by up to two-orders of magnitude compared to a naive n-level algorithm that would be adequate for ordinary graph partitioning. The overall(More)
Hypergraphs are generalizations of graphs where an edge can consist of more than two nodes. A reoccurring task is to divide the node set of a hypergraph into k different non-empty parts where we simultaneously want to minimize a partitioning objective. This problem is called the k-way hypergraph partitioning problem. Applications can be found in the area of(More)
We develop a multilevel algorithm for hypergraph partitioning that contracts the vertices one at a time and thus allows very high quality. This includes a rating function that avoids nonuniform vertex weights, an efficient “semi-dynamic” hypergraph data structure, a very fast coarsening algorithm, and two new local search algorithms. One is a k-way(More)
We present an improved coarsening process for multilevel hypergraph partitioning that incorporates global information about the community structure. Community detection is performed via modularity maximization on a bipartite graph representation. The approach is made suitable for different classes of hypergraphs by defining weights for the graph edges that(More)
The distributed duplicate removal problem is concerned with the detection and subsequent elimination of all duplicate elements in a given multiset that is distributed over several computers connected by a network. Sanders et al. [48] outline a communication efficient algorithm solving this problem. It uses distributed compressed single shot Bloom filters to(More)
<lb>Many problems in computer science can be represented by a graph and reduced<lb>to a graph clustering or k-way partitioning problem. In the classical definition,<lb>a graph consists of nodes and edges which usually connect exactly two nodes.<lb>Hypergraphs are a generalization of graphs, where every edge can connect an<lb>arbitrary number of nodes.(More)
We present the design and a first performance evaluation of Thrill &#x2014; a prototype of a general purpose big data processing framework with a convenient data-flow style programming interface. Thrill is somewhat similar to Apache Spark and Apache Flink with at least two main differences. First, Thrill is based on C&#x002B;&#x002B; which enables(More)