T-NUCA - a novel approach to non-uniform access latency cache architectures for 3D CMPs
This paper studies a new query on uncertain data, called k-selection query. Given an uncertain dataset of N objects, where each object is associated with a ranking score and a presence probability, a k-selection query returns k objects such that the expected ranking score of the best available objects is maximized. This query is useful in several applications such as information retrieval and decision making etc. In evaluating kselection queries, we need to tackle the challenges in computing the expected maximum score (EMS) for a candidate set and searching for the optimal result set with the highest EMS, both of which involve extremely large search space. In this paper, we identify several important properties of k-selection queries, including EMS decomposition, query recursion, and EMS bounding. Based upon these properties, we first present a dynamic programming (DP) algorithm that finds the optimal k-selection results in O(k ·N) time. Further, we propose another algorithm, called the Bounding-and-Pruning (BP) algorithm, that exploits effective search space pruning strategies to find the optimal selection without accessing low score data objects. We evaluate the DP and BP algorithms using both synthetic and real-world data. The results show that the proposed solutions outperform the naive approach by several orders of magnitude, and can efficiently answer k-selection queries over 100,000-object datasets within 1 second.