Learn More
In this article, we present an efficient B<sup>&plus;</sup>-tree based indexing method, called iDistance, for K-nearest neighbor (KNN) search in a high-dimensional metric space. iDistance partitions the data based on a space- or data-partitioning strategy, and selects a reference point for each partition. The data points in each partition are transformed(More)
Mobile devices equipped with positioning capabilities (e.g., GPS) can ask location-dependent queries to Location Based Services (<i>LBS</i>). To protect privacy, the user location must not be disclosed. Existing solutions utilize a trusted anonymizer between the users and the LBS. This approach has several drawbacks: (<i>i</i>) All users must trust the(More)
Given a <i>d</i>-dimensional data set, a point <i>p</i> dominates another point <i>q</i> if it is better than or equal to <i>q</i> in all dimensions and better than <i>q</i> in at least one dimension. A point is a skyline point if there does not exists any point that can dominate it. Skyline queries, which return skyline points, are useful in many decision(More)
In data publishing, the owner delegates the role of satisfying user queries to a third-party publisher. As the publisher may be untrusted or susceptible to attacks, it could produce incorrect query results. In this paper, we introduce a scheme for users to verify that their query results are complete (i.e., no qualifying tuples are omitted) and authentic(More)
In this paper, we present the design and evaluation of PeerDB, a peer-to-peer (P2P) distributed data sharing system. PeerDB distinguishes itself from existing P2P systems in several ways. First, it is a full-fledge data management system that supports fine-grain content-based searching. Second, it facilitates sharing of data without shared schema. Third, it(More)
In many decision-making applications, the skyline query is frequently used to find a set of dominating data points (called skyline points) in a multidimensional dataset. In a high-dimensional space skyline points no longer offer any interesting insights as there are too many of them. In this paper, we introduce a novel metric, called skyline frequency that(More)
In this paper, we propose a novel algorithm to discover the top-k covering rule groups for each row of gene expression profiles. Several experiments on real bioinformatics datasets show that the new top-k covering rule mining algorithm is orders of magnitude faster than previous association rule mining algorithms.Furthermore, we propose a new classification(More)
Peer-to-Peer systems have recently become a popular means to share resources. Effective search is a critical requirement in such systems, and a number of distributed search structures have been proposed in the literature. Most of these structures provide "log time search" capability, where the logarithm is taken base 2. That is, in a system with <i>N</i>(More)
In this paper, we present an eÆcient method, called iDistance, for K-nearest neighbor (KNN) search in a high-dimensional space. iDistance partitions the data and selects a reference point for each partition. The data in each cluster are transformed into a single dimensional space based on their similarity with respect to a reference point. This allows the(More)