Top-k Set Similarity Joins


Similarity join is a useful primitive operation underlying many applications, such as near duplicate Web page detection, data integration, and pattern recognition. Traditional similarity joins require a user to specify a similarity threshold. In this paper, we study a variant of the similarity join, termed top-k set similarity join. It returns the top-k… (More)
DOI: 10.1109/ICDE.2009.111

7 Figures and Tables



Citations per Year

139 Citations

Semantic Scholar estimates that this publication has 139 citations based on the available data.

See our FAQ for additional information.

  • Presentations referencing similar topics