Carsten Binnig

Learn More
Generating databases for testing database applications (e.g., OLAP or business objects) is a daunting task in practice. There are a number of commercial tools to automatically generate test databases. These tools take a database schema (table layouts plus integrity constraints) and table sizes as input in order to generate new tuples. However, the databases(More)
Today, a common methodology for testing a database management system (DBMS) is to generate a set of test databases and then execute queries on top of them. However, for DBMS testing, it would be a big advantage if we can control the input and/or the output (e.g., the cardinality) of each individual operator of a test query for a particular test case.(More)
Traditionally, the goal of benchmarking a software system is to evaluate its performance under a particular workload for a fixed configuration. The most prominent examples for evaluating transactional database systems as well as other components on top (such as a application-servers or web-servers) are the various TPC benchmarks. In this paper we argue(More)
Column-oriented database systems [19, 23] perform better than traditional row-oriented database systems on analytical workloads such as those found in decision support and business intelligence applications. Moreover, recent work [1, 24] has shown that lightweight compression schemes significantly improve the query processing performance of these systems.(More)
Data analytics has recently grown to include increasingly sophisticated techniques, such as machine learning and advanced statistics. Users frequently express these complex analytics tasks as workflows of user-defined functions (UDFs) that specify each algorithmic step. However, given typical hardware configurations and dataset sizes, the core challenge of(More)
OLTP applications usually implement use cases which execute a sequence of actions whereas each action usually reads or updates only a small set of tuples in the database. In order to automatically test the correctness of the different execution paths of the use cases implemented by an OLTP application, a set of test cases and test databases needs to be(More)
The next generation of high-performance RDMA-capable networks requires a fundamental rethinking of the design of modern distributed in-memory DBMSs. These systems are commonly designed under the assumption that the network is the bottleneck and thus must be avoided as much as possible. This assumption no longer holds true. With InfiniBand FDR 4x, the(More)
A major challenge in information management today is the integration of huge amounts of data distributed across multiple data sources. A suggested approach to this problem is ontology-based data integration where legacy data systems are integrated via a common ontology that represents a unified global view over all data sources. However, data is often not(More)
R2RML defines a language to express mappings from relational data to RDF. That way, applications built on top of the W3C Semantic Technology stack can seamlessly integrate relational data. A major obstacle to using R2RML, though, is the effort for manually curating the mappings. In particular in scenarios that aim to map data from huge and complex(More)
In order to deal with mid-query failures in parallel data engines (PDEs), different fault-tolerance schemes are implemented today: (1) fault-tolerance in parallel databases is typically implemented in a coarse-grained manner by restarting a query completely when a mid-query failure occurs, and (2) modern MapReduce-style PDEs implement a fine-grained(More)