Learn More
Nearest neighbor (NN) search in high dimensional space is an important problem in many applications. Ideally, a practical solution (i) should be implementable in a relational database, and (ii) its query cost should grow <i>sub-linearly</i> with the dataset size, regardless of the data and query distributions. Despite the bulk of NN literature, no solution(More)
Nearest Neighbor (NN) search in high-dimensional space is an important problem in many applications. From the database perspective, a good solution needs to have two properties: (i) it can be easily incorporated in a relational database, and (ii) its query cost should increase <i>sublinearly</i> with the dataset size, regardless of the data and query(More)
Being popular in YouTube is becoming a fundamental way of promoting one's self, services or products. In this paper, we conduct an in depth study of fundamental properties of video popularity in YouTube. We collect and study arguably the largest dataset of YouTube videos, roughly 37 million, accounting for 25&#37; of all YouTube videos. We analyze(More)
Given two vertices s, t in a graph, let P be the shortest path (SP) from <i>s</i> to <i>t</i>, and <i>P*</i> a subset of the vertices in <i>P</i>. <i>P*</i> is a <i>k</i>-skip shortest path from <i>s</i> to <i>t</i>, if it includes at least a vertex out of every <i>k</i> consecutive vertices in <i>P</i>. In general, <i>P*</i> succinctly describes <i>P</i>(More)
This paper studies the <i>nearest keyword</i> (<i>NK</i>) problem on XML documents. In general, the dataset is a tree where each node is associated with one or more keywords. Given a node q and a keyword w, an NK query returns the node that is nearest to q among all the nodes associated with w. NK search is not only useful as a stand-alone operator but also(More)
We consider the <i>skyline problem</i> (a.k.a. the <i>maxima problem</i>), which has been extensively studied in the database community. The input is a set <i>P</i> of <i>d</i>-dimensional points. A point <i>dominates</i> another if the former has a lower coordinate than the latter on every dimension. The goal is to find the <i>skyline</i>, which is the set(More)
Conventional spatial queries, such as range search and nearest neighbor retrieval, involve only conditions on objects' geometric properties. Today, many modern applications call for novel forms of queries that aim to find objects satisfying both a spatial predicate, and a predicate on their associated texts. For example, instead of considering all the(More)
An <i>(edge) hidden graph</i> is a graph whose edges are not explicitly given. Detecting the presence of an edge requires expensive <i>edge-probing</i> queries. We consider the <i>k most connected vertex</i> problem on hidden bipartite graphs. Specifically, given a bipartite graph <i>G</i> with independent vertex sets <i>B</i> and <i>W</i>, the goal is to(More)
A hidden database refers to a dataset that an organization makes accessible on the web by allowing users to issue queries through a search interface. In other words, data acquisition from such a source is not by following static hyper-links. Instead, data are obtained by querying the interface, and reading the result page dynamically generated. This, with(More)
We consider the <i>orthogonal range aggregation</i> problem. The dataset <i>S</i> consists of <i>N</i> axis-parallel rectangles in R<sup>2</sup>, each of which is associated with an integer <i>weight</i>. Given an axis-parallel rectangle <i>Q</i> and an aggregate function <i>F</i>, a query reports the aggregated result of the weights of the rectangles in(More)