• Publications
  • Influence
Efficient parallel set-similarity joins using MapReduce
This paper proposes a 3-stage approach for end-to-end set-similarity joins in parallel using the popular MapReduce framework, and reports results from extensive experiments on real datasets to evaluate the speedup and scaleup properties of the proposed algorithms using Hadoop.
The XML benchmark project
The XML Store Benchmark Project provides a framework to assess an XML database''s abilities to cope with a broad spectrum of different queries, typically posed in real-world application scenarios, and offers a set of queries each of which is intended to challenge a particular primitive of the query processor or storage engine.
The oo7 Benchmark
Hyracks: A flexible and extensible foundation for data-intensive computing
The Hyrack end user model, for authors of dataflow jobs, and the extension model for users who wish to augment Hyracks' built-in library with new operator and/or connector types are described.
Concurrency control performance modeling: alternatives and implications
It is shown that differences in the underlying assumptions explain the seemingly contradictory performance results, and the question of how realistic the various assumptions are for actual database systems is addressed.
Shoring up persistent applications
The goals and motivation for SHORE are given, and some novel aspects of the SHORE architecture are described, including a symmetric peer-to-peer server architecture, server customization through an extensible value-added server facility, and support for scalability on multiprocessor systems.
Efficiently publishing relational data as XML documents
The results of an experimental study show that constructing XML documents inside the relational engine can have a significant performance benefit and show the superiority of having the relational engines use what is called an “outer union plan” to generate the content of an XML document.
The HiPAC project: combining active databases and timing constraints
The HiPAC (High Performance ACtive database system) project addresses two critical problems in time-constrained data management: the handling of timing constraints in databases, and the avoidance of