Generalizing prefix filtering to improve set similarity joins

  title={Generalizing prefix filtering to improve set similarity joins},
  author={Leonardo Ribeiro and Theo H{\"a}rder},
  journal={Inf. Syst.},
Identification of all pairs of objects in a dataset whose similarity is not less than a specified threshold is of major importance for management, search, and analysis of data. Set similarity joins are commonly used to implement this operation; they scale to large datasets and are versatile to represent a variety of similarity notions. Most methods proposed so far present two main phases at a high level of abstraction: candidate generation producing a set of candidate pairs and verification… CONTINUE READING

From This Paper

Figures, tables, and topics from this paper.


Publications citing this paper.
Showing 1-10 of 29 extracted citations

PEL: Position-Enhanced Length Filter for Set Similarity Joins

Grundlagen von Datenbanken • 2014
View 8 Excerpts
Highly Influenced

Similarity Joins in Relational Database Systems

Similarity Joins in Relational Database Systems • 2013
View 7 Excerpts
Highly Influenced

Transactions on Large-Scale Data- and Knowledge-Centered Systems XXXVIII

Hui Ma, Gerhard Goos, +6 authors Hui Ma Eds
Lecture Notes in Computer Science • 2018


Publications referenced by this paper.
Showing 1-9 of 9 references

An efficient algorithm for similarity joins with edit distance constraints

Y. Ma R. J. Bayardo, R. Srikant, W. Wang C. Xiao
Proceedings of the ACM SIGMOD International Conference on Management of Data ( SIGMOD • 2004

Rabin , Efficient randomized patternmatching algorithms

O. M.
IBM Journal of Research and Development • 1987

Similar Papers

Loading similar papers…