Angelika Reiser

Learn More
Data stream processing is currently gaining importance due to the developments in novel application areas like escience, e-health, and e-business (considering RFID, for example). Focusing on e-science, it can be observed that scientific experiments and observations in many fields, e. g., in physics and astronomy, create huge volumes of data which have to be(More)
eScience and big data analytics applications are facing the challenge of efficiently evaluating complex queries over vast amounts of structured text data archived in network storage solutions. To analyze such data in traditional disk-based database systems, it needs to be bulk loaded, an operation whose performance largely depends on the wire speed of the(More)
In this paper a new method to increase parallelism in database systems is described. Use is made of the fact that for recovery reasons, we often have two values for one object in the database—the new one and the old one. Introduced and discussed in detail is a certain scheme by which readers and writers may work simultaneously on the same object. It(More)
MapReduce has emerged as a popular tool for distributed and scalable processing of massive data sets and is being used increasingly in e-science applications. Unfortunately, the performance of MapReduce systems strongly depends on an even data distribution while scientific data sets are often highly skewed. The resulting load imbalance, which raises the(More)
In 1973 Bl~m, Floyd et al. [l] presented a linear algorithm to select the i-th smallest element of a se? drs distinct numbers. In a more abstract form we arc given a set A of n distinct objects \-vie a total older defied on this set and a positive integer i < n. The algorithm computes the i-th smallest element x of A, the dement such t3at are exactly 1 of A(More)
Ever increasing main memory sizes and the advent of multi-core parallel processing have fostered the development of in-core databases. Even the transactional data of large enterprises can be retained in-memory on a single server. Modern in-core databases like our HyPer system achieve best-of-breed OLTP throughput that is sufficient for the lion's share of(More)
The field of e-science currently faces many challenges. Among the most important ones are the analysis of huge volumes of scientific data and the connection of various sciences and communities, thus enabling scientists to share scientific interests, data, and research results. These issues can be addressed by processing large data volumes on-thefly in the(More)
The growth in compute speed has outpaced the growth in network bandwidth over the last decades. This has led to an increasing performance gap between local and distributed processing. A parallel database cluster thus has to maximize the locality of query processing. A common technique to this end is to co-partition relations to avoid expensive data(More)