GlOSS: Text-Source Discovery over the Internet

Abstract

The dramatic growth of the Internet has created a new problem for users: location of the relevant sources of documents. This article presents a framework for (and experimentally analyzes a solution to) this problem, which we call the <italic>text-source discovery problem</italic>. Our approach consists of two phases. First, each text source exports its contents to a centralized service. Second, users present queries to the service, which returns an ordered list of promising text sources. This article describes <italic>GlOSS</italic>, Glossary of Servers Server, with two versions: <italic>bGlOSS</italic>, which provides a Boolean query retrieval model, and <italic>vGlOSS</italic>, which provides a vector-space retrieval model. We also present <italic>hGlOSS</italic>, which provides a decentralized version of the system. We extensively describe the methodology for measuring the retrieval effectiveness of these systems and provide experimental evidence, based on actual data, that all three systems are highly effective in determining promising text sources for a given query.

DOI: 10.1145/320248.320252

Extracted Key Phrases

13 Figures and Tables

0204060'99'01'03'05'07'09'11'13'15'17
Citations per Year

441 Citations

Semantic Scholar estimates that this publication has 441 citations based on the available data.

See our FAQ for additional information.

Cite this paper

@article{Gravano1999GlOSSTD, title={GlOSS: Text-Source Discovery over the Internet}, author={Luis Gravano and Hector Garcia-Molina and Anthony Tomasic}, journal={ACM Trans. Database Syst.}, year={1999}, volume={24}, pages={229-264} }