• Publications
  • Influence
Efficient similarity joins for near-duplicate detection
TLDR
This article proposes new filtering techniques by exploiting the token ordering information and drastically reduce the candidate sizes and hence improve the efficiency of existing algorithms to find a pair of records such that their similarities are no less than a given threshold.
Graph Clustering Based on Structural/Attribute Similarities
TLDR
This paper proposes a novel graph clustering algorithm, SA-Cluster, based on both structural and attribute similarities through a unified distance measure, which partitions a large graph associated with attributes into k clusters so that each cluster contains a densely connected subgraph with homogeneous attribute values.
Taming verification hardness: an efficient algorithm for testing subgraph isomorphism
TLDR
This paper proposes a novel and efficient algorithm for testing subgraph isomorphism, QuickSI, and develops a new feature-based index technique to accommodate QuickSI in the filtering phase.
Finding Top-k Min-Cost Connected Trees in Databases
TLDR
This paper proposes a novel parameterized solution, with l as a parameter, to find the optimal GST-1, in time complexity O(3ln + 2l ((l + logn)n + m), where n and m are the numbers of nodes and edges in graph G, which can handle graphs with a large number of nodes.
Holistic Twig Joins on Indexed XML Documents
TLDR
Experimental results on various datasets indicate that the proposed index-based algorithm performs significantly better than the existing ones, especially when binary structural joins in the twig pattern have varying join selectivities.
Querying k-truss community in large and dynamic graphs
TLDR
A novel community model based on the k-truss concept is proposed, which brings nice structural and computational properties and a compact and elegant index structure which supports the efficient search of k- Truss communities with a linear cost with respect to the community size.
Parameter Free Bursty Events Detection in Text Streams
TLDR
This paper proposes a new novel parameter free probabilistic approach, called feature-pivot clustering, which is to fully utilize the time information to determine a set of bursty features which may occur in different time windows.
Efficient Computation of the Skyline Cube
TLDR
Two novel algorithms, Bottom-Up and Top-Down algorithms, are proposed to compute SKYCUBE efficiently and it is shown that new algorithms significantly outperform the naive ones.
Influential Community Search in Large Networks
TLDR
This paper introduces a novel community model called k-influential community based on the concept of k-core, which can capture the influence of a community and proposes a linear-time online search algorithm to find the top-r k-Influential communities in a network.
Path Materialization Revisited: An Efficient Storage Model for XML Data
TLDR
This paper presents a new model-mapping-based storage model, called XParent, and studies the key issues that affect query performance, namely, storage schema design (storing XML data across multiple tables) and path materialization (Storing path information in databases).
...
1
2
3
4
5
...