JOSIE: Overlap Set Similarity Search for Finding Joinable Tables in Data Lakes

@inproceedings{Zhu2019JOSIEOS,
  title={JOSIE: Overlap Set Similarity Search for Finding Joinable Tables in Data Lakes},
  author={Erkang Zhu and Dong Deng and Fatemeh Nargesian and Ren{\'e}e J. Miller},
  booktitle={SIGMOD Conference},
  year={2019}
}
We present a new solution for finding joinable tables in massive data lakes: given a table and one join column, find tables that can be joined with the given table on the largest number of distinct values. The problem can be formulated as an overlap set similarity search problem by considering columns as sets and matching values as intersection between sets. Although set similarity search is well-studied in the field of approximate string search (e.g., fuzzy keyword search), the solutions are… CONTINUE READING

References

Publications referenced by this paper.
SHOWING 1-6 OF 6 REFERENCES