LSH Ensemble: Internet-Scale Domain Search

@article{Zhu2016LSHEI,
  title={LSH Ensemble: Internet-Scale Domain Search},
  author={Erkang Zhu and F. Nargesian and K. Pu and R. Miller},
  journal={ArXiv},
  year={2016},
  volume={abs/1603.07410}
}
We study the problem of domain search where a domain is a set of distinct values from an unspecified universe. We use Jaccard set containment, defined as $|Q \cap X|/|Q|$, as the relevance measure of a domain $X$ to a query domain $Q$. Our choice of Jaccard set containment over Jaccard similarity makes our work particularly suitable for searching Open Data and data on the web, as Jaccard similarity is known to have poor performance over sets with large differences in their domain sizes. We… Expand
50 Citations
LSF-Join: Locality Sensitive Filtering for Distributed All-Pairs Set Similarity Under Skew
  • 2
  • Highly Influenced
  • PDF
Selectivity Estimation on Set Containment Search
  • PDF
GB-KMV: An Augmented KMV Sketch for Approximate Containment Similarity Search
  • 6
  • Highly Influenced
  • PDF
Lazo: A Cardinality-Based Method for Coupled Estimation of Jaccard Similarity and Containment
  • 10
  • Highly Influenced
  • PDF
Adaptive Top-k Overlap Set Similarity Joins
Scalable Data Discovery Using Profiles
  • PDF
JOSIE: Overlap Set Similarity Search for Finding Joinable Tables in Data Lakes
  • 23
  • PDF
Table Union Search on Open Data
  • 49
  • PDF
Pytheas: Pattern-based Table Discovery in CSV Files
  • 2
  • PDF
Similarity query processing for high-dimensional data
  • 1
  • PDF
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 30 REFERENCES
LSH forest: self-tuning indexes for similarity search
  • 350
  • Highly Influential
  • PDF
On indexing error-tolerant set containment
  • 48
  • PDF
Similarity Search in High Dimensions via Hashing
  • 3,254
  • PDF
MeanKS: meaningful keyword search in relational databases with complex schema
  • 13
WebTables: exploring the power of tables on the web
  • 605
  • PDF
Answering Table Queries on the Web using Column Keywords
  • 107
  • PDF
Locality-sensitive hashing scheme based on p-stable distributions
  • 2,563
  • PDF
InfoGather: entity augmentation and attribute discovery by holistic matching with web tables
  • 196
  • PDF
An Empirical Performance Evaluation of Relational Keyword Search Techniques
  • 71
  • PDF
...
1
2
3
...