HARRA: fast iterative hashed record linkage for large-scale data collections

  title={HARRA: fast iterative hashed record linkage for large-scale data collections},
  author={Hung-sik Kim and Dongwon Lee},
We study the performance issue of the "iterative" record linkage (RL) problem, where match and merge operations may occur together in iterations until convergence emerges. We first propose the Iterative Locality-Sensitive Hashing (ILSH) that dynamically merges LSH-based has tables for quick and accurate blocking. Then, by exploiting inherent characteristics within/across data sets, we develop a suite of I-LSH-based RL algorithms, named as HARRA (<u>HA</u>shed <u>R</u>eco<u>R</u>d link<u>A</u>ge… CONTINUE READING
Highly Cited
This paper has 71 citations. REVIEW CITATIONS


Publications citing this paper.
Showing 1-10 of 42 extracted citations

72 Citations

Citations per Year
Semantic Scholar estimates that this publication has 72 citations based on the available data.

See our FAQ for additional information.

Similar Papers

Loading similar papers…