Ralf Schenkel

Learn More
Motivated by the ongoing success of Linked Data and the growing amount of semantic data sources available on the Web, new challenges to query processing are emerging. Especially in distributed settings that require joining data provided by multiple sources, sophisticated optimization techniques are necessary for efficient query processing. We propose novel(More)
In this paper we present HOPI, a new connection index for XML documents based on the concept of the 2–hop cover of a directed graph introduced by Cohen et al. In contrast to most of the prior work on XML indexing we consider not only paths with child or parent relationships between the nodes, but also provide space– and time–efficient reachability tests(More)
The HOPI index, a connection index for XML documents based on the concept of a 2-hop cover, provides space- and time-efficient reachability tests along the ancestor, descendant, and link axes to support path expressions with wildcards in XML search engines. This paper presents enhanced algorithms for building HOPI, shows how to augment the index with(More)
Recent IR extensions to XML query languages such as Xpath 1.0 Full-Text or the NEXI query language of the INEX benchmark series reflect the emerging interest in IR-style ranked retrieval over semistructured data. TopX is a top-k retrieval engine for text and semistructured data. It terminates query execution as soon as it can safely determine the k(More)
Martin Theobald, Gerhard Weikum, Ralf Schenkel Max-Planck Institute of Computer Science D-66123 Saarbruecken, Germany {mtb, weikum, schenkel}@mpi-sb.mpg.de Abstract Top-k queries based on ranking elements of multidimensional datasets are a fundamental building block for many kinds of information discovery. The best known general-purpose algorithm for(More)
In addition to purely occurrence-based relevance models, term proximity has been frequently used to enhance retrieval quality of keyword-oriented retrieval systems. While there have been approaches on effective scoring functions that incorporate proximity, there has not been much work on algorithms or access methods for their efficient evaluation. This(More)
Online communities have become popular for publishing and searching content, as well as for finding and connecting to other users. User-generated content includes, for example, personal blogs, bookmarks, and digital photos. These items can be annotated and rated by different users, and these social tags and derived user-specific scores can be leveraged for(More)
Top-<i>k</i> query processing is an important building block for ranked retrieval, with applications ranging from text and data integration to distributed aggregation of network logs and sensor data. Top-<i>k</i> queries operate on index lists for a query's elementary conditions and aggregate scores for result candidates. One of the best implementation(More)
This paper presents a novel engine, coined TopX, for efficient ranked retrieval of XML documents over semistructured but nonschematic data collections. The algorithm follows the paradigm of threshold algorithms for top-k query processing with a focus on inexpensive sequential accesses to index lists and only a few judiciously scheduled random accesses. The(More)