Marcus Leich

Learn More
We present Stratosphere, an open-source software stack for parallel data analysis. Stratosphere brings together a unique set of features that allow the expressive, easy, and efficient programming of analytical applications at very large scale. Stratosphere’s features include “in situ” data processing, a declarative query language, treatment of user-defined(More)
Recently, quite a few query and scripting languages for MapReduce-based systems have been developed to ease formulating complex data analysis tasks. However, existing tools mainly provide basic operators for rather simple analyses, such as aggregating or filtering. Analytic functionality for advanced applications, such as data cleansing or information(More)
In this paper, we present our approach for searching mathematical formulae. We focus on a batch query approach that does not rely on specialized indexes, which are usually domain dependent and restrict the expressiveness of the query language. Instead, we use Stratosphere, a distributed data processing platform for Big Data Analytics that accesses data in a(More)
Mathematical formulae are essential in science, but face challenges of ambiguity, due to the use of a small number of identifiers to represent an immense number of concepts. Corresponding to word sense disambiguation in Natural Language Processing, we disambiguate mathematical identifiers. By regarding formulae and natural text as one monolithic information(More)
Analyzing big data sets as they occur in modern business and science applications requires query languages that allow for the specification of complex data processing tasks. Moreover, these ideally declarative query specifications have to be optimized, parallelized and scheduled for processing on massively parallel data processing platforms. This paper(More)
With the advent of new contact-less sensors for forensic investigations of latent fingerprint traces, the authors see the need for a benchmarking framework to evaluate existing devices and promising combinations of data acquisition and signal processing techniques. This paper extends the existing benchmarking framework from [1] by categorizing it into(More)
This paper compares the search capabilities of a single human brain supported by the text search built into Wikipedia with state-of-the-art math search systems. To achieve this, we compare results of manual Wikipedia searches with the aggregated and assessed results of all systems participating in the NTCIR-12 MathIR Wikipedia Task. For 26 of the 30 topics,(More)