LSH Ensemble: Internet-Scale Domain Search

@article{Zhu2016LSHEI,
  title={LSH Ensemble: Internet-Scale Domain Search},
  author={Erkang Zhu and Fatemeh Nargesian and Ken Q. Pu and Ren{\'e}e J. Miller},
  journal={PVLDB},
  year={2016},
  volume={9},
  pages={1185-1196}
}
We study the problem of domain search where a domain is a set of distinct values from an unspecified universe. We use Jaccard set containment, defined as $|Q \cap X|/|Q|$, as the relevance measure of a domain $X$ to a query domain $Q$. Our choice of Jaccard set containment over Jaccard similarity makes our work particularly suitable for searching Open Data and data on the web, as Jaccard similarity is known to have poor performance over sets with large differences in their domain sizes. We… CONTINUE READING

Citations

Publications citing this paper.
SHOWING 1-10 OF 21 CITATIONS

Lazo: A Cardinality-Based Method for Coupled Estimation of Jaccard Similarity and Containment

  • 2019 IEEE 35th International Conference on Data Engineering (ICDE)
  • 2019
VIEW 8 EXCERPTS
CITES BACKGROUND & METHODS
HIGHLY INFLUENCED

GB-KMV: An Augmented KMV Sketch for Approximate Containment Similarity Search

  • 2019 IEEE 35th International Conference on Data Engineering (ICDE)
  • 2018
VIEW 12 EXCERPTS
CITES METHODS & BACKGROUND
HIGHLY INFLUENCED

Dimension Reduction on Open Data Using Variational Autoencoder

  • 2018 IEEE International Conference on Data Mining Workshops (ICDMW)
  • 2018
VIEW 3 EXCERPTS
CITES BACKGROUND & METHODS
HIGHLY INFLUENCED

Open Data Integration

VIEW 8 EXCERPTS
CITES BACKGROUND

References

Publications referenced by this paper.
SHOWING 1-10 OF 27 REFERENCES