Text Joins for Data Cleansing and Integration in an RDBMS

  title={Text Joins for Data Cleansing and Integration in an RDBMS},
  author={Luis Gravano and Panagiotis G. Ipeirotis and Nick Koudas and Divesh Srivastava},
An organization’s data records are often noisy because of transcription errors, incomplete information, lack of standard formats for textual data or combinations thereof. A fundamental task in a data cleaning system is matching textual attributes that refer to the same entity (e.g., organization name or address). This matching can be effectively performed via the cosine similarity metric from the information retrieval field. For robustness and scalability, these “ text joins” are best done… CONTINUE READING
Highly Cited
This paper has 54 citations. REVIEW CITATIONS


Publications citing this paper.
Showing 1-10 of 37 extracted citations

Efficient set joins on similarity predicates

SIGMOD Conference • 2004
View 4 Excerpts
Highly Influenced

Cohesion based attribute value matching

2017 10th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI) • 2017
View 1 Excerpt

55 Citations

Citations per Year
Semantic Scholar estimates that this publication has 55 citations based on the available data.

See our FAQ for additional information.

Similar Papers

Loading similar papers…