• Publications
  • Influence
Quality and efficiency in high dimensional nearest neighbor search
TLDR
This work proposes a new access method called the locality sensitive B-tree (LSB-tree) that enables fast high-dimensional NN search with excellent quality and reduces its space and query cost dramatically, and outperforms adhoc-LSH even though the latter has no quality guarantee.
Semantics of Ranking Queries for Probabilistic Data and Expected Ranks
TLDR
This work is able to prove that, in contrast to all existing approaches, the expected rank satisfies all the required properties for a ranking query, and provides efficient solutions to compute this ranking across the major models of uncertain data, such as attribute-level and tuple-level uncertainty.
The Priority R-tree: a practically efficient and worst-case optimal R-tree
TLDR
This study shows that the PR-tree performs similar to the best known R-tree variants on real-life and relatively nicely distributed data, but outperforms them significantly on more extreme data.
Efficient and accurate nearest neighbor and closest pair search in high-dimensional space
TLDR
This work improves LSH by proposing an access method called the Locality-Sensitive B-tree (LSB-tree) to enable fast, accurate, high-dimensional NN search in relational databases, and extends the LSB technique to solve another classic problem, called Closest Pair (CP) search, in high- dimensional space.
Finding frequent items in probabilistic data
TLDR
This paper proposes a new definition based on the possible world semantics that has been widely adopted for many query types in uncertain data management, trying to find all the items that are likely to be frequent in a randomly generated possible world.
Wander Join: Online Aggregation via Random Walks
TLDR
This paper proposes a new approach, the wander join algorithm, to the online aggregation problem by performing random walks over the underlying join graph, and designs an optimizer that chooses the optimal plan for conducting the random walks without having to collect any statistics a priori.
Tree indexing on solid state drives
TLDR
FD-tree is proposed, a tree index designed with the logarithmic method and fractional cascading techniques that dominates the other B+-tree index variants on the overall performance on flash disks as well as on magnetic disks.
Algorithms for distributed functional monitoring
TLDR
A carefully constructed multi-round algorithm that uses "sketch summaries" at multiple levels of detail and solves the (<i>k, F</i><sub>2</sub>, τ, ε) problem with communication and gives upper and lower bounds for the problem for some of the basic <i>f</i>'s.
Efficient Processing of Top-k Queries in Uncertain Databases with x-Relations
TLDR
This work introduces novel polynomial algorithms for processing top-k queries in uncertain databases under the generally adopted model of x-relations, and introduces the first-known polynometric algorithms, while the current best algorithms have exponential complexity in both time and space.
Mergeable summaries
TLDR
This paper demonstrates that the MG and the SpaceSaving summaries for heavy hitters are indeed mergeable or can be made mergeable after appropriate modifications, and provides the best known randomized streaming bound for ε-approximate quantiles that depends only on ε, of size O(1 overε log 3/21 over ε).
...
1
2
3
4
5
...