Learn More
Nearest neighbor (NN) search in high dimensional space is an important problem in many applications. Ideally, a practical solution (i) should be implementable in a relational database, and (ii) its query cost should grow <i>sub-linearly</i> with the dataset size, regardless of the data and query distributions. Despite the bulk of NN literature, no solution(More)
This paper studies the <i>nearest keyword</i> (<i>NK</i>) problem on XML documents. In general, the dataset is a tree where each node is associated with one or more keywords. Given a node q and a keyword w, an NK query returns the node that is nearest to q among all the nodes associated with w. NK search is not only useful as a stand-alone operator but also(More)
Quantiles are a crucial type of order statistics in databases. Extensive research has been focused on maintaining a space-efficient structure for approximate quantile computation as the underlying dataset is updated. The existing solutions, however, are designed to support only the current, most-updated, snapshot of the dataset. Queries on the past versions(More)
Nearest Neighbor (NN) search in high-dimensional space is an important problem in many applications. From the database perspective, a good solution needs to have two properties: (i) it can be easily incorporated in a relational database, and (ii) its query cost should increase <i>sublinearly</i> with the dataset size, regardless of the data and query(More)
A hidden database refers to a dataset that an organization makes accessible on the web by allowing users to issue queries through a search interface. In other words, data acquisition from such a source is not by following static hyper-links. Instead, data are obtained by querying the interface, and reading the result page dynamically generated. This, with(More)
We consider the <i>skyline problem</i> (a.k.a. the <i>maxima problem</i>), which has been extensively studied in the database community. The input is a set <i>P</i> of <i>d</i>-dimensional points. A point <i>dominates</i> another if the former has a lower coordinate than the latter on every dimension. The goal is to find the <i>skyline</i>, which is the set(More)
Let D be a given set of (string) documents of total length n. The top-k document retrieval problem is to index D such that when a pattern P of length p, and a parameter k come as a query, the index returns those k documents which are most relevant to P. We present the first non-trivial external memory index supporting top-k document retrieval queries in(More)
We consider the <i>orthogonal range aggregation</i> problem. The dataset <i>S</i> consists of <i>N</i> axis-parallel rectangles in R<sup>2</sup>, each of which is associated with an integer <i>weight</i>. Given an axis-parallel rectangle <i>Q</i> and an aggregate function <i>F</i>, a query reports the aggregated result of the weights of the rectangles in(More)