Learn More
MapReduce systems have become popular for processing large data sets and are increasingly being used in e-science applications. In contrast to simple application scenarios like word count, e-science applications involve complex computations which pose new challenges to MapReduce systems. In particular, (a) the runtime complexity of the reducer task is(More)
In this paper a new method to increase parallelism in database systems is described. Use is made of the fact that for recovery reasons, we often have two values for one object in the database—the new one and the old one. Introduced and discussed in detail is a certain scheme by which readers and writers may work simultaneously on the same object. It(More)
—MapReduce has emerged as a popular tool for distributed and scalable processing of massive data sets and is increasingly being used in e-science applications. Unfortunately, the performance of MapReduce systems strongly depends on an even data distribution, while scientific data sets are often highly skewed. The resulting load imbalance, which raises the(More)
The field of e-science currently faces many challenges. Among the most important ones are the analysis of huge volumes of scientific data and the connection of various sciences and communities, thus enabling scientists to share scientific interests, data, and research results. These issues can be addressed by processing large data volumes on-the-fly in the(More)
eScience and big data analytics applications are facing the challenge of efficiently evaluating complex queries over vast amounts of structured text data archived in network storage solutions. To analyze such data in traditional disk-based database systems, it needs to be bulk loaded, an operation whose performance largely depends on the wire speed of the(More)