LSH Ensemble: Internet-Scale Domain Search

@article{Zhu2016LSHEI,
  title={LSH Ensemble: Internet-Scale Domain Search},
  author={Erkang Zhu and Fatemeh Nargesian and Ken Q. Pu and Ren{\'e}e J. Miller},
  journal={PVLDB},
  year={2016},
  volume={9},
  pages={1185-1196}
}
  • Erkang Zhu, Fatemeh Nargesian, +1 author Renée J. Miller
  • Published in PVLDB 2016
  • Computer Science
  • We study the problem of domain search where a domain is a set of distinct values from an unspecified universe. We use Jaccard set containment, defined as $|Q \cap X|/|Q|$, as the relevance measure of a domain $X$ to a query domain $Q$. Our choice of Jaccard set containment over Jaccard similarity makes our work particularly suitable for searching Open Data and data on the web, as Jaccard similarity is known to have poor performance over sets with large differences in their domain sizes. We… CONTINUE READING

    Create an AI-powered research feed to stay up to date with new papers like this posted to ArXiv

    Citations

    Publications citing this paper.
    SHOWING 1-10 OF 24 CITATIONS

    GB-KMV: An Augmented KMV Sketch for Approximate Containment Similarity Search

    VIEW 11 EXCERPTS
    CITES METHODS & BACKGROUND
    HIGHLY INFLUENCED

    Lazo: A Cardinality-Based Method for Coupled Estimation of Jaccard Similarity and Containment

    VIEW 8 EXCERPTS
    CITES BACKGROUND & METHODS
    HIGHLY INFLUENCED

    Interactive Navigation of Open Data Linkages

    VIEW 6 EXCERPTS
    CITES METHODS & BACKGROUND

    ICLab: A Global, Longitudinal Internet Censorship Measurement Platform

    VIEW 3 EXCERPTS
    CITES METHODS
    HIGHLY INFLUENCED

    Dimension Reduction on Open Data Using Variational Autoencoder

    VIEW 3 EXCERPTS
    CITES BACKGROUND & METHODS
    HIGHLY INFLUENCED

    Open Data Integration

    VIEW 8 EXCERPTS
    CITES BACKGROUND

    References

    Publications referenced by this paper.
    SHOWING 1-10 OF 27 REFERENCES

    On the resemblance and containment of documents

    • Andrei Z. Broder
    • Mathematics, History, Computer Science
    • Proceedings. Compression and Complexity of SEQUENCES 1997 (Cat. No.97TB100171)
    • 1997
    VIEW 4 EXCERPTS
    HIGHLY INFLUENTIAL

    The Mannheim Search Join Engine

    VIEW 1 EXCERPT

    An Empirical Performance Evaluation of Relational Keyword Search Techniques

    VIEW 1 EXCERPT

    In Defense of Minhash over Simhash

    VIEW 2 EXCERPTS

    Instance-Based Matching of Large Ontologies Using Locality-Sensitive Hashing

    VIEW 1 EXCERPT