The popularity of on-line document databases has led to a new problem: finding which text databases (out of many candidate choices) are the most relevant to a user. Identifying the relevant databases for a given query is the <italic>text database discovery problem</italic>. The first part of this paper presents a practical solution based on estimating the result size of a query and a database. The method is termed <italic>GlOSS—Glossary of Servers Server</italic>. The second part of this paper evaluates the effectiveness of <italic>GlOSS</italic> based on a trace of real user queries. In addition, we analyze the storage cost of our approach.
Unfortunately, ACM prohibits us from displaying non-influential references for this paper.
To see the full reference list, please visit http://dl.acm.org/citation.cfm?id=191869.