A Primitive Operator for Similarity Joins in Data Cleaning

@article{Chaudhuri2006APO,
  title={A Primitive Operator for Similarity Joins in Data Cleaning},
  author={Surajit Chaudhuri and Venkatesh Ganti and Raghav Kaushik},
  journal={22nd International Conference on Data Engineering (ICDE'06)},
  year={2006},
  pages={5-5}
}
Data cleaning based on similarities involves identification of "close" tuples, where closeness is evaluated using a variety of similarity functions chosen to suit the domain and application. Current approaches for efficiently implementing such similarity joins are tightly tied to the chosen similarity function. In this paper, we propose a new primitive operator which can be used as a foundation to implement similarity joins according to a variety of popular string similarity functions, and… CONTINUE READING

Citations

Publications citing this paper.
SHOWING 1-10 OF 403 CITATIONS

Fast Duplicated Documents Detection using Multi-level Prefix-filter

VIEW 5 EXCERPTS
CITES METHODS & BACKGROUND
HIGHLY INFLUENCED

Efficient Exact Set-Similarity Joins

VIEW 14 EXCERPTS
CITES RESULTS, BACKGROUND & METHODS

Scalable algorithms for signal reconstruction by leveraging similarity joins

Abolfazl Asudeh, Jees Augustine, +4 authors Divesh Srivastava
  • The VLDB Journal
  • 2019
VIEW 5 EXCERPTS
CITES METHODS & BACKGROUND
HIGHLY INFLUENCED

Leveraging Similarity Joins for Signal Reconstruction

VIEW 5 EXCERPTS
CITES BACKGROUND & METHODS
HIGHLY INFLUENCED

FILTER CITATIONS BY YEAR

2005
2020

CITATION STATISTICS

  • 94 Highly Influenced Citations

  • Averaged 30 Citations per year from 2017 through 2019