Learn More
— Given a set D = {d 1 , d 2 , ..., d D } of D strings of total length n, our task is to report the " most relevant " strings for a given query pattern P. This involves somewhat more advanced query functionality than the usual pattern matching, as some notion of " most relevant " is involved. In information retrieval literature, this task is best achieved(More)
It is infeasible for a sensor database to contain the exact value of each sensor at all points in time. This uncertainty is inherent in these systems due to measurement and sampling errors, and resource limitations. In order to avoid drawing erroneous conclusions based upon stale data, the use of uncertainty intervals that model each data item as a range(More)
Ranking is an important property that needs to be fully supported by current relational query engines. Recently, several rank-join query operators have been proposed based on rank aggregation algorithms. Rank-join operators progressively rank the join results while performing the join operation. The new operators have a direct impact on traditional query(More)
Let D ={d1, d2, ...dD} be a given set of D string documents of total length n, our task is to index D, such that the k most relevant documents for an online query pattern P of length p can be retrieved efficiently. We propose an index of size |CSA| + n log D(2 + o(1)) bits and O(ts(p)+k log log n+poly log log n) query time for the basic relevance metric(More)
Rank-aware query processing has emerged as a key requirement in modern applications. In these applications, efficient and adaptive evaluation of top-<i>k</i> queries is an integral part of the application semantics. In this article, we introduce a rank-aware query optimization framework that fully integrates rank-join operators into relational query(More)
the cleansed value directly is highly desirable. Data cleansing applications often result in uncertainty in Uncertainty in categorical data is commonplace in many the "cleaned" value of an attribute. Many cleansing tools applications, including data cleaning, database integration, provide alternative corrections with associated likelihood. and biological(More)
We introduce a new variant of the popular Burrows-Wheeler transform (BWT) called Geometric Burrows-Wheeler Transform (GBWT). Unlike BWT, which merely permutes the text, GBWT converts the text into a set of points in 2-dimensional geometry. Using this transform, we can answer to many open questions in compressed text indexing: (1) Can compressed data(More)
Orion is a state-of-the-art uncertain database management system with built-in support for probabilistic data as first class data types. In contrast to other uncertain databases, Orion supports both attribute and tuple uncertainty with arbitrary correlations. This enables the database engine to handle both discrete and continuous pdfs in a natural and(More)
Current data structures for searching large string collections either fail to achieve minimum space or cause too many cache misses. In this paper we discuss some edge linearizations of the classic trie data structure that are simultaneously cache-friendly and compressed. We provide new insights on front coding [24], introduce other novel linearizations, and(More)