Top-k Set Similarity Joins

Abstract

Similarity join is a useful primitive operation underlying many applications, such as near duplicate Web page detection, data integration, and pattern recognition. Traditional similarity joins require a user to specify a similarity threshold. In this paper, we study a variant of the similarity join, termed top-k set similarity join. It returns the top-k… (More)
DOI: 10.1109/ICDE.2009.111

7 Figures and Tables

Topics

Statistics

0102020082009201020112012201320142015201620172018
Citations per Year

139 Citations

Semantic Scholar estimates that this publication has 139 citations based on the available data.

See our FAQ for additional information.

  • Presentations referencing similar topics