• Publications
  • Influence
Efficient similarity joins for near-duplicate detection
TLDR
We propose new filtering techniques by exploiting the token ordering information; they are integrated into the existing methods and drastically reduce the candidate sizes and hence improve the efficiency. Expand
  • 530
  • 88
  • PDF
SPARK2: Top-k Keyword Query in Relational Databases
TLDR
We propose a new ranking formula by adapting existing IR techniques based on a natural notion of virtual document for answering top-k keyword query in relational database systems. Expand
  • 268
  • 35
  • PDF
Holistic Twig Joins on Indexed XML Documents
TLDR
We address the problem of efficient processing of holistic twig joins on all/partly indexed XML documents. Expand
  • 307
  • 35
  • PDF
XR-tree: indexing XML data for efficient structural joins
TLDR
We propose XR-tree, namely, XML region tree, which is a dynamic external memory index structure specially designed for strictly nested XML data. Expand
  • 252
  • 26
  • PDF
Efficient Computation of the Skyline Cube
TLDR
We consider the problem of efficiently computing a SKYCUBE, which consists of skylines of all possible non-empty subsets of a given set of dimensions. Expand
  • 318
  • 22
  • PDF
Path Materialization Revisited: An Efficient Storage Model for XML Data
TLDR
We present a new model-mapping-based storage model, called XParent, for storing XML data in database management systems. Expand
  • 104
  • 22
  • PDF
SRS: Solving c-Approximate Nearest Neighbor Queries in High Dimensional Euclidean Space with a Tiny Index
TLDR
In this paper, we propose several surprisingly simple methods to answer c-ANN queries with theoretical guarantees requiring only a single tiny index. Expand
  • 61
  • 21
  • PDF
Stabbing the sky: efficient skyline computation over sliding windows
TLDR
We consider the problem of efficiently computing the skyline against the most recent N elements in a data stream in a d-dimension space if the data distribution on each dimension is independent. Expand
  • 285
  • 17
  • PDF
Top-k Set Similarity Joins
TLDR
We study a variant of the similarity join, termed top-k set similarity join. Expand
  • 180
  • 16
  • PDF
Keyword search on structured and semi-structured data
TLDR
An overview of the state-of-the-art techniques for supporting keyword search on structured and semi-structured data, including query result definition, ranking functions, result generation and top-k query processing, snippet generation, result clustering, query cleaning, performance optimization, and search quality evaluation. Expand
  • 196
  • 14
  • PDF