Anthony Tomasic

Learn More
The dramatic growth of the Internet has created a new problem for users: location of the relevant sources of documents. This article presents a framework for (and experimentally analyzes a solution to) this problem, which we call the <italic>text-source discovery problem</italic>. Our approach consists of two phases. First, each text source exports its(More)
Access to large numbers of data sources introduces new problems for users of heterogeneous distributed databases. End users and application programmers must deal with unavailable data sources. Database administrators must deal with incorporating each new data source into the system. Database implementors must deal with the transformation of queries between(More)
With the proliferation of the world's &#8220;information highways&#8221; a renewed interest in efficient document indexing techniques has come about. In this paper, the problem of incremental updates of inverted lists is addressed using a new dual-structure index. The index dynamically separates long and short inverted lists and optimizes retrieval, update,(More)
The performance of distributed text document retrieval systems is strongly in uenced by the organization of the inverted index. This paper compares the performance impact on query processing of various physical organizations for inverted lists. We present a new probabilistic model of the database and queries. Simulation experiments determine which variables(More)
Accessing data from numerous widely distributed sources poses signi cant new challenges for query opti mization and execution Congestion and failures in the network can introduce highly variable response times for wide area data access This paper is an initial ex ploration of solutions to this variability We introduce a class of dynamic run time query plan(More)
The popularity of on-line document databases has led to a new problem: finding which text databases (out of many candidate choices) are the most relevant to a user. Identifying the relevant databases for a given query is the <italic>text database discovery problem</italic>. The first part of this paper presents a practical solution based on estimating the(More)
The popularity of on line document databases has led to a new problem nding which text databases out of many candidate choices are the most relevant to a user Identifying the relevant databases for a given query is the text database discovery problem The rst part of this paper presents a practical solution based on estimating the result size of a query and(More)
Accessing many data sources aggravates prob lems for users of heterogeneous distributed databases Database administrators must deal with fragile mediators that is mediators with schemas and views that must be sig ni cantly changed to incorporate a new data source When implementing translators of queries from mediators to data sources database implementors(More)
On line information vendors o er access to multi ple databases In addition the advent of a variety of INTERNET tools has provided easy distributed access to many more databases The result is thou sands of text databases from which a user may choose for a given information need a user query This pa per an abridged version of presents a framework for and(More)