• Publications
  • Influence
Efficient similarity joins for near-duplicate detection
TLDR
This article proposes new filtering techniques by exploiting the token ordering information and drastically reduce the candidate sizes and hence improve the efficiency of existing algorithms to find a pair of records such that their similarities are no less than a given threshold. Expand
Taming verification hardness: an efficient algorithm for testing subgraph isomorphism
TLDR
This paper proposes a novel and efficient algorithm for testing subgraph isomorphism, QuickSI, and develops a new feature-based index technique to accommodate QuickSI in the filtering phase. Expand
Finding Top-k Min-Cost Connected Trees in Databases
TLDR
This paper proposes a novel parameterized solution, with l as a parameter, to find the optimal GST-1, in time complexity O(3ln + 2l ((l + logn)n + m), where n and m are the numbers of nodes and edges in graph G, which can handle graphs with a large number of nodes. Expand
Probabilistic Skylines on Uncertain Data
TLDR
A novel probabilistic skyline model where an uncertain object may take a probability to be in the skyline, and a p-skyline contains all the objects whose skyline probabilities are at least p is proposed. Expand
Selecting Stars: The k Most Representative Skyline Operator
TLDR
An efficient, scalable, index-based randomized algorithm is developed by applying the FM probabilistic counting technique and a comprehensive performance evaluation demonstrates that the randomized technique is very efficient, highly accurate, and scalable. Expand
Ranking queries on uncertain data: a probabilistic threshold approach
TLDR
An efficient exact algorithm, a fast sampling algorithm, and a Poisson approximation based algorithm are presented for answering probabilistic threshold top-k queries on uncertain data, which computes uncertain records taking a probability of at least p to be in the top- k list. Expand
SPARK2: Top-k Keyword Query in Relational Databases
TLDR
This paper proposes a new ranking formula by adapting existing IR techniques based on a natural notion of virtual document and proposes several efficient query processing methods for the new ranking method. Expand
Ed-Join: an efficient algorithm for similarity joins with edit distance constraints
TLDR
A new algorithm, Ed-Join, is proposed that exploits the new mismatch-based filtering methods; it achieves substantial reduction of the candidate sizes and hence saves computation time and is demonstrated experimentally that the new algorithm outperforms alternative methods on large-scale real datasets under a wide range of parameter settings. Expand
Efficient Subgraph Matching by Postponing Cartesian Products
TLDR
For the first time, the issue of unpromising results by Cartesian products from "dissimilar" vertices is addressed and a new framework by postponing theCartesian products based on the structure of a query to minimize the redundant Cartesian Products is proposed. Expand
Efficient Computation of the Skyline Cube
TLDR
Two novel algorithms, Bottom-Up and Top-Down algorithms, are proposed to compute SKYCUBE efficiently and it is shown that new algorithms significantly outperform the naive ones. Expand
...
1
2
3
4
5
...